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BIOMETRIKA 


CONGENITAL ANOMALIES IN A NATIVE 
AFRICAN RACE . 


By HUGH STANNUS STANNUS, M.D. Lond, Medical Officer, Nyasaland. 


(1) I HAVE thought it would be of interest to put on record some observations 
made by myself in Nyasaland during the past seven years, on the subject which 
appears as the title of this paper. 


These observations relate to members of a native population of Bantu stock, 
belonging to several main tribes, namely, Mananja, Yao, Ngoni and Tumbuka, 
with a few references to the Nkonde in the north and the Nguru from the south- 
east. 


My interest in the subject was aroused by the frequency with which some 
abnormalities were seen and I think the facts I bring forward will go to shew that 
this unusual incidence is real and not only the result of the ease with which 
observations may be made among a partially clothed community. 


Statistics dealing with the subject, to be of value, must treat of large numbers, 
such have however only been possible in a few instances to be referred to later. 
I speak therefore largely from impressions in appraising the rarity or otherwise of 
any particular condition. It should be remembered in this direction that the 
cases now to be reported have been met with more or less casually, most of them 
while travelling on the path or in some village, few in the course of Native 
Hospital work and none in any Special Department. 

Classification is a matter of some difficulty for many reasons and as the number 
of anomalies to be described is not very large it is perhaps more convenient to 
consider the various conditions according to the anatomical part affected. 

One large section of congenital anomalies, Anomalies of Pigmentation, I have 
already dealt with (Biometrika, Vol. 1x. pp. 333—365), and they will not be 
touched on in the present paper. 
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(2) Dealing with those deviations from the normal in’which there is a change 
of a more or less general nature, I refer firstly to Infantilism, at the same time 
recognising that such a condition may not constitute a truly congenital anomaly. 


To the class designated Idiopathetic Infantilism I should. relegate a woman 
aged 22 years seen in 1911 at Zomba who presented the figure and development 
of a girl of 13. There was no breast development, no pubic or axillary hair and 
the rounded contours of the body and limbs usually associated with this age in a 
woman were wanting; menstruation had not commenced. In other respects she 
appeared normal and her mental development was but little if at all below the 
average. 


(3) In W. Nyasa I encountered a very excellent example of the Ateliotic Dwarf, 
a perfect “little man,’ a man in miniature 1:25 metres in height. Another case 
which I think must be considered as one of simple dwarfism is here reproduced :— 
Samuti, aged 35, a Yao, 1:42 metres high. He is shewn together with a man of 
1:85 metres. Samuti shews no other abnormality (Plate IJ, (1)). 


No case of Cretinism or Myxoedematous Dwarfism has been seen. I may here 
mention that Cachetic Infantilism is well seen in some cases of spinal caries 
among Natives just as among HKuropeans. 


A paper on “Congenital Humeral Micromely” in the Nouvelle Iconographie 
de la Salpétriere, T. xxiv. pp. 463—471, Paris 1911, by Dr S. A. Kinnier Wilson 
and myself, contains references to two cases of Achondroplasia in Nyasaland. 
Since then I have heard of two other cases and seen a fifth :—Etimu, male, aged 
25 years, a Yao, son of Masinjiri of Ndindi’s near Chipoli, Dedza District. The 
subject stated that he had no children and that no member of the family was known 
to have been similarly affected. He is a perfect example of the condition as the 
photographs will attest, and further remarks are unnecessary (Plate I, (3) and (4)). 


The following measurements were made and tracings of his hands are here 


depicted (Fig. 1): 


(1) Head: maximum length. : : : ; .  ,20:1>em: 
(2) - breadth . : : : : 2 elas 
(3) circumference . 5 : : F : . 600 
(4) Nose: length, base to root . : ; : 3°6 
(5) breadth, across nostrils. : : : : 45 
(6) Face: bizygomatic breadth . : : Become 0) 
(7) length, nasion to chin 4 3 : 5 ae lala 
(8) F. to commissure of lips : ; 6'7 
(9) Standing height . : ‘ ; LS 2 
(10) Span of arms. ; : : . diss 


(11) Arm: acromion to external condyle of humerus. Bi a4) 
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(12) Forearm: external humeral condyle to tip of ulnar 
tubercle. : : ; : : 3 : Se aliecm: 
(13) Forearm to tip of middle finger . ‘ : : oe Oe 
(14) Leg: top of iliac crest to head of fibula. ; 5 SS 
(15) 5 - - to external malleolus — . eS 
(16) . . z to sole of foot i : ~ ol 
(17) Trunk: upper border of sternum to umbilicus . . 3&4 
(18) ‘ symphysis pubis 45 
BY: 
Left. Fig. 1. Etimu. Right. 


(4) No case of actual Gigantism has been seen. Tallness or shortness often 
runs in families. The tallest man I have ever seen measured 1°92 metres. He 
was the father of an albinotic child and had internal strabismus but no signs of 
acromegaly (see Plate I, (2)). Another man who I have not seen but who was 
measured by Dr Davey at Kota Kota was 2:0 metres in height. No case of 
Acromegaly has been seen by myself. 


(5) The following case in the want of development of the lower jaw and 
zygomatic arches might be considered as the converse to acromegaly (Fig. 2). 
From the sketch the subject will at once be recognised as a type of Congenital 
Idiot, the above-mentioned features and ill-formed pinnae together with the rather 
bird-like appearance being characteristic. 
1—2 
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Jaidi, male, aged 20 years, a Yao of Chumbosa, Bursali, is the second child of 
a family of three, the elder brother being dead and the younger sister normal. 
No family history was elicited. 


Fig. 2. Jaidi, 


The growth of the face is defective as before noted, the zygomatic arches are 
so little developed that there are practically no cheeks. The descending rami of 
the jaws converge very considerably so that the floor of the mouth is very narrow 
and the horizontal rami are so short that the symphysis is situated mid-way be- 
tween the lower lip and the neck as they lie on one horizontal plane. The palate 
is high and narrow. 


The following measurements were made: 


Maximum occipito frontal . : : ~ 19:tsem: 
e bi-parietal . : : ; : . 138 
Bizygomatic at junction of zygoma with temporal . > 123 
Nose: length. : : ; : : S47 
breadth . ; : : ; : ; : : 318 
Face: nasion to commissure of lips . : oleae 
» 5» symphysis of chin . : | lisa 


Right external strabismus is present and vision defective. 

Though mentally an imbecile with an impaired speech he is an excellent field 
labourer. He states that no woman would marry him but that he has had sexual 
intercourse and that he is capable of the act. 
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A few other cases of Congenital Idiocy have been seen and include an example 
of Spastic Diplegia, a Mongol Idiot aged 4 years in W. Nyasa district and two 
microcephalic idiots met with in adjacent villages in Chikala district, in neither of 
which were factors of etiological interest elicited. 


(a) Aged 22, male, looked like a boy of 12 in physical development, the head 
was very stnall but no measurements were made; the palpebral fissures were 
markedly slanting downwards and inwards and an internal strabismus was present ; 
the ears and palate were normal; the hands large and like those of a man. 


(b) A male infant aged one year with so marked a degree of microcephaly as 
to approach in type anencephaly, the resemblance being the more marked as the 
protuberant eyes and lips were like those characteristically found in anencephalic 


monsters (Plate IT, (7)). 


(6) The following case is given at length (Plate II, (5) and (6)). 


Masimosya, aged 19 years (1911), a Yao of Chipi’s village Zomba, exhibits a 
marked want of development of sexual organs (male) associated with large breasts. 
The general form of the body is that of a woman; the attitude, voice, laugh, 
facial aspect and expression resemble those of a woman rather than of a man. 
The teeth are good, the body and limbs well developed and there is a fair deposit 
of subcutaneous fat. The breasts (see photo) are remarkable, being large, with 
large well-formed nipples and well-marked areolae, dark in colour. They have 
started to beome pendulous and resemble exactly those of a nulliparous woman 
of the same age. The abdomen is well formed and round the umbilicus there 
is a deposit of fat such as is commonly seen in women; the pelvis appears large. 
There is some hair in the axillae but none on the face or body. The pubes is 
rather prominent resembling the female mons veneris and there is some develop- 
ment of hair upon it. The penis is very small, only two inches in length and of 
infantile type, the glans is covered by a prepuce and there is no deformity. The 
scrotum is very small indeed and only contains one testicle, the left, which can be 
felt as a small body about the size of a bean, three-eighths of an inch long. The 
right testicle is not apparently present in the scrotum or inguinal canal. The 
scrotum shews no tendency to be divided nor is there anything in the arrangement 
of the skin to suggest labia. No rectal examination was made. . 


The subject is insane. He is fairly tractable and good-natured. He has 
delusions and hallucinations, it is reported, with various phases of the moon, when he 
is said to travel 15 miles to bathe in a certain stream, etc. He has tried to burn 
down some houses. I could get very little of his history. The mother and father 
are said to have been normal; the only other child, a girl, was insane and died in 
the Central Asylum. The subject once cohabited with a woman who was to have been 
his wife, but she ran away the next day and I was unable to find out from him if 
he had any sexual desire. Such is a case which would have been called one of 
Partial Hermaphroditism but in the absence of further data I shall not discuss it. 
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(7) Obesity. No-cases of general obesity outside normal limits with possibly 
a congenital origin have been seen. Steatopygy does not occur. 

(8) Symmetrical Lipomatosis is conveniently considered here though perhaps 
not strictly within the subject. Three old women have been seen all presenting 
the same abnormal feature, namely, the presence of symmetrical lipomata in both 
axillae, each about the size of a small orange. In a fourth case the affection was 
one-sided, the subject giving a history of the gradual descent of the tumour from 
the upper aspect of the shoulder into the arm-pit. 

That these tumours were lipomata I can only support by clinical examination, 
they certainly were not of the nature of the pads seen in myxoedema and no 
signs of that disease were present. There is the possibility that they were acces- 
sory breasts but they did not present the characters found in undoubted cases of 
this condition. These tumours may have a similar pathogeny to the masses seen 
on either side of the back of the neck of men and specially described by 
Sir Jonathan Hutchinson; on account of their possible paleogenetic significance 
I have included notes on these cases here. 


(9) Lymphatism, Post-mortem examination on a boy 10 years of age who died 
after receiving a blow on the head revealed a thymus gland of considerable bulk, 
4 inches long. The blow had not severed the soft tissues over the skull and in 
the absence of any other evidence of injury or disease one might suspect the case 
to be one of lymphatism, an inherent disorder which had predisposed to death. 
In a second case, that of a woman aged 40 years who died after moderately severe 
burns, a body 44 inches long of yellow colour and firm consistency was found lying 
on the anterior surface of the heart, the apex of this body being at a level with 
the 2nd costal cartilage. 

(10) Coming now to Malformations, there is a well-defined deformation of the 
skull of which I have seen several examples, the main points of which are well 
shewn in the photographs. The extreme height of the cranium and marked 
dolicocephaly without bossing of the forehead, while the sides of the vault of 
the skull are flattened, are characteristic. The photographs depict a boy aged 7, 
son of Matikwiri, headman of Mlanje, whose two younger sisters are said to 
resemble him exactly in the deformity present (Plate IV, (12) and (13)). 

The second case is a boy aged 15 years, the head measured 21°5 cm. long and 
12°5 cm. broad (Plate III, (9)—(11)). 

(11) Congenital Ptosis is not uncommon and is associated with the typical 
expression due to this disability. A slight degree of Epicanthus may be fairly 
often observed; more marked, it is sometimes seen associated with obliquity 
of the palpebral fissures giving a regular mongolian character to the face 
(Fig. 3). 

Buphthalmos has been seen on two occasions in young adults with a history of 
its congenital nature but nothing else of note; tension normal and vision appa- 
rently good. 
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Microphthalmos was once seen associated with coloboma of the iris and choroid 
(see below). 


Coloboma. This defect was met with in two brothers aged about 18 and 17 
years, but neither parent nor, as far as 1 could ascertain, any other member of the 
family was similarly affected. Bwanali the elder presented a coloboma of the iris 
and choroid of the left eye; there was also a small opacity on the posterior surface 
of the lens, which however could not be traced more deeply but which suggested a 
remnant of an “arteria centralis.” . 


2 CLs 


Fig. 3. Epicanthus. 


The right cornea shewed some superficial opacities, the iris appeared normal, 
but examination of the fundus revealed a large white triangular area with the 
apex near the disc with here and there small masses of pigment. The middle 
portion of the white area was on a much deeper plane than the rest of the fundus, 
forming a posterior staphyloma, the whole composing a kind of posterior coloboma 


(Fig. 4). 


Right eye. Left eye. 
Fig. 4. Bwanali Coloboma. 


This boy also had an accessory nipple. 
The younger brother Pete presented on the right side a microphthalmic eye 
with coloboma of iris and choroid resembling the condition in his brother, with an 
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opaque spot on the posterior surface of the lens. The eye is convergent and 
vision poor; he counts fingers at one yard. The left eye is normal. 


Dermoid Cysts of the Face have been seen in the situations shewn in the sketch 
(Fig. 5). One of these was excised and found to contain the usual pultaceous 


Fig. 5. Dermoid cysts of face. 


mass mixed with hairs. These hairs examined microscopically were found to be 
spindle shaped, tapering at each end, brown diffuse and granular pigment was 
present in them. 

A relic of the cleft between the median and upper external processes of the 
foetal face was on one occasion seen as a small pit at the lower extremity of and 
just external to an epicanthal fold. 


(12) Congenital Naevus. Only two cases of naevus have been seen. One a 
woman presented a small naevus just to the left of the middle line on the forehead 
at the margin of the hairy scalp, 1 cm. in diameter. The second was a man with a 
similar growth 1 cm. in diameter on the lower lip just to the right of the middle 
line (Ching’waya of Zomba). 


(13) Har. The general conformation of the ear varies a good deal; some of the 
types are shewn in the sketches (Fig. 6) but all these must be considered as coming 


J III 


Fig. 6. 
within the limits of normal variation. In one case a kind of Accessory Lobule was 
noted ; the subject was an albino. A number of persons with Accessory Auricles 
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have been seen. These consist of little subcutaneous nodules of cartilage forming 
tubercles one to four in number situated just in front of the tragus, the affection 
being usually bilateral. 

An abnormality seen affecting a woman in N. Nyasa consisted in the direct 
prolongation of the skin from the side of the head on to the outer surface of the 
pinna so that the upper margin of the ear was hidden, though easily felt beneath 
the skin. 


Helical fistula. Under this name have been described the remains of the first 
branchial cleft found as little pits on the helix. The condition is certainly rare 
in England and persons exhibiting the anomaly are sometimes shewn as interesting 
cases at medical societies. That heredity plays a part in its incidence is well 
known as illustrated by a case shewn by Dr Prichard at the Royal Society of 
Medicine, an infant with symmetrical helical fistulae, whose mother, four siblings, 
maternal grandmother and two great-aunts all exhibited the same defect. 
Having noted this same anomaly in quite a number of natives I became interested 
to ascertain the actual incidence. The statistics given below embody the results 
of my observations covering nearly 6500 individuals of all tribes. The popula- 
tions of whole villages were taken so that consecutive unselected persons were dealt 
with, 


Tribe core Right Left pou 

. { Males 416 7 6 3 

No gis { Females | 612 13 Soni bas 
eS Males ... | 100 | — 4 1 
Females... 136 — 3 — 

Wionsa J Males... 1941 | 34 22 12 
6” | Females... 2576 | 69 53 23 
Wankonde* ... as 455 | 4 8 5 
Awemba a ace 48 1 — 1 
Anyanja ee oa 65! 1 1 -— 
Ahenga sb aah | 142 3 5 — 
Totals... .. | 6491 132 110 50 


Thus among 6491 individuals of all ages and both sexes a total of 292 were 
found to have helical fistula (4°5 °/,). It was more commonly unilateral, affecting 
the right side a little more often than the left, giving percentages of 2°08 and 1°69 
respectively and for bilateral cases 0°77°/,. Taking each sex we see that the 
proportions between the three numbers are almost the same. 


2457 males 3 41 By) 16 
3324 females... 82 64 28 


* These figures were kindly supplied by Dr Davey. 
Biometrika x 
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The actual incidence in the two sexes is however greater among females than 
males in the proportion of 5:2°/, to 3°6°/,. An abnormality occurring so fre- 
quently as 45 per mille might almost be considered to be a variation within the 
limits of the normal. The fact remains, however, that it is the persistence of 
a foetal character and abnormal, if the whole of mankind be taken into con- 
sideration. 

Dealing more in detail with this defect, there is some variation in the exact 
site of the fistula; the sketches (Fig. 7) serve to illustrate the extremes of posi- 
tion in three directions. 


JIIVG 


Fig. 7. 

Three cases presented two pits on the same side, one each in positions A 
and B. In these three cases the affection was bilateral and symmetrical. The 
common position at which the pit is found is in D. In another case not included 
in the series a pit was observed resembling those above mentioned but situated at 
the junction of the tragus and lobule as in £. 

These helical fistulae, which I have described as little pits, consist of a small 
opening on the skin 1 or 2 mm. in diameter leading into a blind sac 1 or 2 mm. 
deep; often this sac opens out into a little ampulla which can be seen and felt 
under the skin. The ampulla and canal are generally filled with a little plug of 
sebaceous matter. 

In three cases the skin in this situation looked like scar tissue and presented 
a honey-combed appearance, there being several openings into the ampulla giving 
the impression that an abscess had formed at some past date in the ampulla with 
consequent loss of tissue. 

The fistula is so common and so unremarkable that most tribes have no name 
for it and one cannot elicit long pedigrees to shew its incidence in families. Cases 
of heredity were common enough but the type was not necessarily the same in 
members of the same family; thus a mother with Left Fistula had a child with 
Right and Left, or again, three brothers were seen two with the Left side affected, 
the third with Right Fistula. 

No malformations in connection with other branchial clefts have been seen, 


H. S. Srannus 11 


(14) Lips, Mouth and Palate. Most natives shew a well-marked tubercle in 
the median line on the “red” margin of the upper lip; in a few however this is 
replaced by a distinct groove which involves the red margin of the lip or only the 
subjacent fold of mucous membrane (see sketch Fig. 8 and photo, Plate V, (16)). 


Jae ‘ = 


Fig. 8. 


These cases resemble one of a Hindu (recorded in the Lancet, Oct. 2, 1909, by 
Thurston), who besides having the median hare-lip was the subject of poly- 
dactylism. In one of my cases there was a considerable gap between the upper 
central incisors but no further abnormalities were present. 


In a single case notching of the upper lip was found to the left of the middle 
line with a mark running up to the nostril which looked like a scar. There was 
no question of any operation having been performed, though the condition 
resembled exactly an artificial repair of a lateral hare-lip (Fig. 9). A similar 
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case has been shewn at the W. Lond. Med. Chir. Society in which there was, 
besides, a deformity of the nose and a family history of hare-lip. I have only seen 
one case of ordinary Hare-Lip, a Blantyre boy aged 10 years (1909), the affection 
being left-sided and unassociated with any cleft of the palate (Plate V, (17)). 
Among 30,000 natives examined in the northern districts of this country no 
case was seen. 


No case of typical Cleft Palate has come to my notice; on the other hand I 
have seen three cases which owing to their non-association with defects in the 
upper lip are of great interest. All three cases, one a boy aged 10 years (1906), 
the other two adult males, presented complete Absence of the Premasilla and 
attached teeth. In the boy there was also a Median Perforation in the hard 
Palate. Congenital perforations of the palate apart from clefts are apparently 
rare in Europe. Dundas Grant (Roy. Soc. Med. April 1910) has recorded the case 
of a girl aged 16 years with a perforation above and to the right of the base of 
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the uvula with no history of trauma or syphilis. Prof. Karl Pearson has drawn 
my attention to a skull which was brought by Du Chaillu from Fernand Vas in 


the Congo (see Biometrika, Vol. vit, Plate XXVI); this shews congenital . 


absence of the premaxilla, but the two maxillae have not approximated in the 
mid-line in front as in my own cases, and we do not know the condition of the 
soft parts, but it is interesting to see this anomaly from another part of Africa. 


(15) Teeth. Native children are said to be born sometimes with teeth; it is 
possible that this is not very rare as there is a common superstition regarding 
them. I have seen one case with this history, to be mentioned later, as having 
deformities of the lower extremities. A gap of as much as 4 of an inch between 
the lower central incisors has been noticed a number of times, the other teeth 
all being regular and touching one another. A similar condition may be seen also 
affecting the upper pair of incisors, one that I am not conversant with among 
Europeans. Among 1500 natives examined for statistical purposes in regard to 
caries the following numerical abnormalities were noted: 


(a) Complete reduplication of the set of teeth in an adult, the second set 
lying on the palatal side of what appeared to be the normal set. I have every 
reason to believe that this was a case of true reduplication, that is to say, the 
result of growth from doubled enamel organs and not of retention of the deciduous 
teeth. 


(b) Reduplication of upper incisors. 
(c) Reduplication of right lower bicuspid. 


(d) Reduplication of both bicuspids in the lower jaw on each side and in the 
upper jaw on the right side in a woman aged 24 years. 


A single case of a Bifid Eatrenuty to the Tongue was seen in an albino 
child. 


(16) Polymazia and Polythelia. 14 cases of these anomalies have been met 
with casually, so that I imagine this anomaly by excess is comparatively not 
uncommon. Short notes of these cases are given below for purposes of com- 
parison : 

(a) Male adult, accessory nipple springing from the skin at the right sternal 
edge opposite the 38rd intercostal space, it was large and well formed like a 
woman’s but there was nothing resembling an accessory mamma beneath it. 


(b) Male aged 45. Insane and suffering from spinal caries. There was a 
rudimentary accessory nipple in Scarpa’s triangle on the right side 14” below 
Poupart’s ligament. 


(c) Adult female, an accessory nipple on the right breast, small but well 
formed and lying above the one proper to the breast; both are patent and milk 
can be drawn through both. 
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(d) Adult male, the accessory nipple is situated in a line with the left nipple 
below it and half-way between it and the costal margin. 


(e) and (f) Two women each had two nipples to the right breast. 


(g) A young woman was found to have two nipples on the left breast 
(Plate V, (18)). 


(h) Male with congenital coloboma iridis mentioned above has an accessory 
nipple just above and to the inner side of the right nipple. 


(«) Young male adult has just at the outer edge of the areola of the left 
breast a very small accessory nipple, and beyond this and above it over the third 
intercostal space another flat nipple with areola and hairs. 


(j) Male, presents a rudimentary nipple in the left groin just below the 
middle of Poupart’s ligament. 


(k) Young adult male shews a small accessory nipple just below and internal 
to the right nipple; his brother, father, and grandfather are all possessed of the 
same identical anomaly. The subject has no children, no nephews or nieces. 


(1) Female in hospital with syphilis has a small accessory nipple springing 
from the skin of the chest wall just internal to the point of the left pendant 
breast. 


(m) A woman with well-formed accessory breast in the right axilla. It is 
breast-shaped and pendant though there is no nipple. The woman volunteered 
the fact that it was a breast and said it swelled with pregnancy. The right breast 
was twice as big as the left. 


(x) An old woman with symmetrical masses in each axilla resembling rather 
the symmetrical lipomata mentioned elsewhere: see p. 6. She states that they 
appeared at puberty and thinks them to be breasts but denies that they enlarged 
with pregnancy. 

In the Japanese this condition has been shewn to be not unrare, and among 
them tuberculosis has been found to be more frequent than among the normal 
population. I can only support the idea with one case (No. b). 


(17) Meningocoele and Spina Bifida. No typical case has been noted. A man 
was seen with a little dipple of the skin over the lower part of the sacrum in the 
median line having a little fold of skin on either side forming two small vertical 
lips. 


(18) Penis, Testicle; Hernia. 
Epispadias, hypospadias and extroversion of the bladder have never been seen. 


T have seen a boy aged 18 years with a short penis enclosed in a fold of skin 
from the upper surface of the scrotum (Fig. 10).. The boy had other deformities 
which are described later. When examining a number of recruits I was surprised 
to find in a large proportion the right testicle hanging lower than the left, the 
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reverse of what is known to occur among Europeans. On examination of 400 
consecutive men, adults, between the ages of 30 and 40 years, I found in 166 or 
41°5°/, the right testicle lower than the left. In the remainder or 58°5°/, the 
right testicle was on a level with the left, or rather higher in the scrotum. I also 
got the impression that, associated with right lower testicles, the testicles and 
penis were large. In another series of 280 men, the left was lower than or on the 


Fig. 10. Boy aged 18. 
same level as the right in 185; the right lower in 88. There were two cases of 


left cryptorchidism, one of right cryptorchidism; one each left and mght hydro- 
coeles and two right inguinal bubonocoeles. ites 


I have come across a number of cases of undescended testis among other 
natives, in some associated with a swelling in the inguinal canal, in others there 
was complete cryptorchidism. Inguinal hernia is not infrequent in adult males 
but I can give no figures relating to a large number of persons. In a single man 
it was associated with umbilical hernia. I have never seen a femoral hernia. 
Umbilical hernia is common enough especially in children. The following figures 
though small in number give some idea of the frequent incidence of the condition. 
They refer to all the children in a single village and may therefore be said to be 
unselected in any way. 


Age = P oe aE 

O— 1 year 18 13 6 

1— 2 years 44 27 12 
3—10_,, 102 44 10 

Totals 164 84 28 | =276, 

giving roughly COM. 0k Gis 

_—_—Yr— 
40 °/, 


That the hernia diminishes in size even to disappearance after childhood, as 
indicated by these figures, is certainly true as the same incidence is undoubtedly 
not found among adults. The protrusion is sometimes very marked and takes the 
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shape of the finger of a glove, some several inches long and curving downwards 
(Plate IV, (14)). Writing recently E. M. Corner in doubting the commonness of 
congenital sacs in hernia in general, as insisted on by some writers, has shewn in a 
series extending to between two and three thousand observations that herniae in 
children are often multiple and associated particularly with a ventral hernia, a 
diastema, which though very rare at birth 1s common in young children and of the 
nature of a true hernia. He believes that this ventral protrusion, which is certainly 
not congenital, is caused by increased abdominal pressure due to gaseous distension 
of the bowels the result of fermentative processes, and that other herniae are due 
to the same cause. Among native children abdominal distension is almost the 
rule, “ pot-bellied” is an expression always used in speaking of them. This dis- 
ténsion is due largely, I believe, to fermentative processes, and also a second factor, 
absent in European children, namely enlarged spleen. Of 50 children under the 
age of 5 years taken from among those with umbilical hernia, 43 or 86 °/, were 
found to have the ventral protrusion as described by Corner. 18 of these had 
enlarged spleens and 20 shewed a considerable abdominal distension. In none 
was any other hernia found. In these cases we see par emcellence the effect of 
intra-abdominal pressure, in producing first ventral hernia and ‘secondly umbilical 
hernia. The weakness of the umbilical scar is due, I have little doubt, to the 
method of treating the cord at birth. The custom prevailing among many is to 
bind the whole cord and placenta on to the child’s abdomen till it separates ; 
with others the greater part of the cord is so treated after severing the placenta ; 
in any case there must be considerable tension, I think, at the umbilicus and 
sepsis is more likely to occur. Cursham Corner has said that the size of the 
bulging is proportional to the length of cord left proximal to the ligature, and 
the same principle adapted to natives who use no ligature may be true, and 
thus account for the very “long” umbilical hernias. 


I am therefore inclined to agree with Corner that the umbilical and the ventral 
herniae of children are due largely to intra-abdominal pressure, but though my 
numbers are small, the absence of any other hernia among my cases must be 
taken to mean that for their production there is another factor to be taken into 
account, and that is, I believe, in Corner’s cases some congenital structural 
anomaly, namely a congenital sac, and, conversely, I think congenital sacs are 
uncommon among natives of this country. 


(19) Malformations of the Extremities. Various forms of Congenital Talipes 
are met with which call for no special comment. 


A peculiar condition characterised by symmetrical shortening of the humeri 
has been observed and forms the subject of a paper by Dr 8. A. Kinnier Wilson 
and myself referred to above; certain deformities of the hands and feet are also 
therein dealt with. 


Since this paper was written I have seen three other cases of Congenital 
Humeral Micromely, one of which I mention here as there is a family history of 
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the defect, a point of some interest and one which I had not elicited in previous 
cases. 


Gobedi, male, aged 22 years, a Yao employed as a machila carrier in Zomba, 
exhibits the deformity in typical form well represented in the photograph 
(Plate V, (19)). 


The head of each humerus appears to be poorly developed and though move- 
ment at the shoulder joint is free, a certain amount of fine crepitus is elicited, 
such as was found in several of the other cases. 


The point of interest however is the fact that the maternal aunt is stated to 
have had the same congenital anomaly. 


The subject has no brothers or sisters and his own two young children are 
stated to be normals, his mother and father and more remote relations are not 
known to be affected. 


Besides these the following cases deserve mention. 


A boy was seen, 18 years of age, with a peculiar deformation of the hands, 
stated to be congenital; the fingers and thumb shewed considerable thickening 
about the Ist interphalangeal joints with marked ulnar deflection; the bridge of 
the nose was depressed, the lips very thick, and epicanthus present. There was 
also the penile deformity above mentioned, except for which I should have 
doubted the statement in regard to the congenital nature of the hand deformity 
(Fig. 11). 


Fig. 11, Boy aged 18, 
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I saw at Bandawe a female infant aged 14 years presenting multiple defor- 
mities. The astragalus of the left foot was apparently implanted in a cup- 
shaped depression on the lower end of a very much shortened thigh. The femur 
of this leg was short but around it there was an abnormal amount of muscle as 
if the usual amount of muscle for a normal had been cramped up into the 
shortened limb ; the foot could be freely moved by the child. The left foot had only 
a hallux and two toes with a partial cleft between the hallux and the adjacent 
toe, but I think four metatarsal bones. The right thigh was also somewhat 
shortened but the bones of the leg apparently both present, the knee-joint could 
not be distinctly made out and was flail. Right talipes equinovarus present, also 
right internal strabismus. No history of similar deformity in family, a brother a 
year older was born with two upper incisors. Father and mother normal. The 
father has two other wives with six and ten children respectively, all normal. 
Such gross congenital deformities are from time to time recorded in Europe, thus 
Lockart Mummery described a case of congenital absence of the femur in a male 
child, etc., in the Brit. Med. Jour. for November 5, 1910. 


In a male 35 years of age I found Congenital Absence of the Right Fibula, the 
tibia being bowed forward with 8 inches shortening of the limb, the foot on the 
same side had only three metatarsal bones and three digits including the hallux. 
A woman was seen with congenital shortening of one leg to the extent of four 
inches. 


A single case of unilateral Congenital Dislocation of the Hip has been met 
with. 


(20) Split Hand and Split Foot Deformities. The photograph, Plate IV, (15), 
serves to shew moderately well the deformities met with in a male child aged 
5 years (1905): in the absence of a skiagram it is impossible to go into the detail 
of the bony conditions present. There was no admitted history of similar or 
other deformity in the family. 


A second case, Ndala of Njalusi’s Mangoche, shewed a similar deformity of the 
left hand but in a less degree; he was otherwise normal and stated that no other 
members of his family were similarly affected (Plate VI, (20)). 


These cases are interesting to compare with those collected and classified by 
Lewis and Embleton in Biometrika, Vol. v1, 1908. 


(21) Shortening of the Fourth Metatarsal Bone. When first I entered the 
country my attention was attracted by a number of natives who presented a 
shortening of the fourth toe. 


Since then Captain Hughes has noticed the condition in Egypt. The descrip- 
tion he gives is as follows (Lancet, July 16, 1910):—“ The fourth toe is markedly 
retracted usually behind the level of the fifth toe. The phalanges are not appa- 
rently abnormally short, and the metatarsal bone can be felt unfractured but with 
the head very much farther back than usual. Commonly the digit is pushed 
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upwards by the pressure inwards of the fifth toe. The condition is sometimes 
unilateral sometimes bilateral.” He adds that in one case the second metatarsal 
and, in another, the third metatarsal were also shortened. In a single case he 
saw a similar condition in the hand, shortening of the second and fifth meta- 
carpals. 

The above description corresponds exactly with the condition seen in this 
country. I have also seen other toes than the fourth affected, and I shew a photo- 


graph of a man’s feet with involvement of the metatarsal of the hallux; in- 


another case the fifth was affected; in another case, a woman, the common variety 
was associated with shortening of the third metatarsal of the left foot (see Plate VI, 
(22), (24) and Fig. 12). 


/ 
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Fig. 12. Fig. 13. 
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(22) Syndactyly of various degrees has been observed; sketches of two 
examples are given in Fig. 138. 

(23) Polydactyly is not at all uncommon. I have casually come across some 
dozen cases in five years. 


In the majority the supernumerary digit consists of a miniature phalanx 
attached to the skin of the hand or foot at the level of the head of the fifth meta- 
carpal or -tarsal bone. Such digits are often removed in childhood, leaving 
a small cartilaginous nodule at the seat of removal. Most commonly it is a 
symmetrical affection of both hands and feet; in other cases hands or feet alone 
(Plate VJ, (23)), or one extremity only, present the deformity. In some the 
accessory digit is well formed and an accessory metatarsal or metacarpal bone 
more or less complete is present. In one case it was the hallux which was 
reduplicated, the two digits being partially fused. In another, reported to me, 
the supernumerary digit in each hand was situated on the radial side of the 
first finger with probably an accessory metacarpal bone in connection with it. 
The feet bad extra digits beyond the fifth toes. 

(24) The following case is of some interest: 

Chibisa, male, an Angoni of Kawenga’s, aged 30 years. The deformities in- 
volve all segments of the right upper and lower limbs and to a minor extent the 
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left limbs (Plate VI, (21) and Figs. 15—18). On the right side there is shortening 
of the humerus and forearm (10 cm. difference between the two sides), but 
elongation of all the segments of the middle finger and its metacarpal bone; the 
middle finger itself measures 10} cm. The metacarpal bones and phalanges of 
the other fingers are, I think, absolutely a little shortened. The left arm and 
hand are normal, except that this hand as also the right hand shew a little 
nodule at the base of the little finger where a supernumerary digit was removed, 


® 
Fig. 15. Fig. 16. 
O 
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E™ 
Fig, 17. Fig. 18. 


The right foot presents a similar condition to the right hand, elongation of 
phalanges and metatarsal affecting the second toe, the toe itself being 7 em. 
long. The tibia is somewhat bowed outwards. The left foot presents shortening 
of the metatarsal bone of the hallux. . 
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The photograph and sketches illustrate some of these points. Other measure- 
ments were as follows: 
Height 166°5 cm.; span of arms 164°5 cm.; 
Maximum fronto-occipital 18°0; maximum biparietal 13'8; - 


Nose length and width 4:4. 


(25) Congenital Anomalies of the Kidney. Post-mortem examination on a 
native prisoner who died of pellagra revealed the presence of a double kidney 
on the left side and none on the right. 


From the sketches (Fig. 19) it will be seen that the upper part was the one 
proper to the side while the lower half was the abnormal portion. 


L.Suparenal-|-- 


Fig. 19. Churinigu. Kidney of Left side double. 


The two parts were really very distinct, partly separated by a groove and 
cleft. 

The lower viscus had been felt during life as a tumour in the abdomen of 
unknown nature as it lay along the left side of the vertebral column. The kidney 
was unfortunately removed before dissection of vessels, etc. was made, but the 
sketch shews the arrangement of these at the hilum of the kidney. . 


The two ureters united below the lower pole of the double organ, the distal 
ureter being nearly twice the normal size. 
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The bladder was normal; there was no right ureter. The suprarenal body 
of the right side was in its normal position and appeared normal. No other 
abnormalities were remarked. 


(26) Some suggestive observations have been made by Dr Ewald Stier, 
published in the Deutsche Zeitschrift fiir Nervenheilkunde (Band xiv, Heft 1-2, 
S. 21), from which the generalisation is made that in all anomalies of overgrowth the 
right side of the body is much more frequently involved than the left, whereas in 
anomalies of undergrowth the left is more commonly the site of the condition 
than the right, this distribution being the result of a preponderance of persons 
with a leading or superior left cerebral hemisphere, as with left-handed persons 
the converse was found to be true. In other words, the plus anomalies occur on 
the right side in right-handed people and the minus anomalies on the left side, 
the left hemisphere being the superior hemisphere, the converse being true. 


I have therefore tabulated my observations, and though small in number they 
tend to confirm the idea assuming that the African native is right-handed. This 
remains unproved and a less marked superiority of the left hemisphere may 
account for non-conformity of my few cases to Stier’s rule. 


| | 
Right | Bilateral) Left 
Plus Anomalies:  ° | 
Reduplication of teeth 2 3 0) 
Polymazia : : ae eel 1 0 
Polythelia ... soe She i0 | O 5 
| Polydactyly ... 1 1 2 
Minus Anomalies: 
Hare-lip ; | @ 2 
Cryptor chidism 2 0 2 
Absence of Fibula : 1 0 0 
~ and Tibia... 0 0) 1 
| Split "hand, foot 0 1 1 
| Shortened metacar pal, tarsal 0 1 3 
Syndactyly : 1 0 0 
Coloboma iris = 1 1 0 
Plus and Minus together : | 
Chibisa 1 (+) — | 1(-) 


In considering these cases it should be remembered that the majority of my 
observations have been made casually among natives met in the bush, in villages, 
etc., others in the course of routine work among troops, prisoners, etc., the few 
were the result of special investigation. 


(27) Concluding Remarks. The notes of cases which I have thus collected 
together form rather a medley of facts but I think certain deductions may be 
made from them. 
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It would appear that 


(1) The slighter the anomaly the greater the frequency with which it may 
be observed. 


(2) The more marked degrees of deformity are only seen in children and 
those in places where European influence is felt. 


(3) Cases of heredity are only seen among the lesser anomalies. 


(4) The least obvious congenital anomaly is a helical fistula, and this is 
found in 46 °/, of the population and is frequently inherited. 


The difference in the observed incidence between the minor anomalies and 
those of more marked proportions may be real or only apparent. I think the 
latter supposition is true for reasons which can be deduced from the facts given 
above. 


It is the custom among all the tribes of this country to destroy all deformed 
children at birth. Any minimal deformity such as a helical fistula is of course 
unrecognised, an accessory nipple is probably hardly noticeable, accessory digits 
which can be removed by a nick with a knife are matters of no import, while a 
foot with six well-formed toes would hardly be considered worthy of note. These 
abnormalities are therefore comparatively common, but hare-lip, cleft-palate, 
deformities common enough in Europe, are among the rarest in this country; a 
child with a hare-lip would be seen to resemble a hare and would be immediately 
destroyed. Children with the greater deformities would certainly be destroyed. 
In recent years under European influence native customs fall into abeyance and so 
we see my single case of hare-lip in a boy aged 10 at Blantyre, a township of 25 
years standing, a child with gross deformities of the lower extremities born prac- 
tically on a mission station; or, to quote another example, an albino reported by 
myself was the fifth albino child born, the first four having been killed at birth 
by order of a chief, who in later years came under the influence of an up-country 
mission station, for which the living albino has to thank his survival. The gross 
abnormality of absence of premaxilla would pass unnoticed as the deformity is 
slight. History relates that in the case of the child with lobster claw deformity of 
hands and feet, it was only saved from a summary death by the efforts of the 
mother. 


I think with the evidence as it stands one may with fairness say that con- 
genital anomalies are common among the natives of this country. Secondly, I 
think one may also deduce from the facts stated that abnormalities of all kinds 
are at least not uncommon. In the few cases in which I have adduced statistics 
there can be no doubt, in other cases it is rather a matter of one’s impression. 


I have shewn that certain congenital anomalies among natives of Nyasaland 
are common and have attempted to argue that probably many of them are 
common. 


ae 
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(28) Very few statistics are available for comparison, but I should hke to refer 
to some by writers in Egypt. Prof. Madden cites in a letter to the Lancet a case 
of cleft-palate which he operated on as the first in 11 years during surgical work 
at the Kasr-el-ainy Hospital, and assigns as the cause of the lack of such cases 
the “truly awful struggle for existence” which would eliminate infants so handi- 
capped. The Lancet remarked (Lancet, July 3, 1909), in an annotation upon this 
letter, that Prof. Elliot Smith considers it to be impossible to endeavour to explain 
this rarity of congenital defects in Egypt, unless the time-honoured scapegoat of 
our too modern civilisation be invoked to account for their frequency in other 
countries. Statistics of the Kasr-el-ainy Hospital compiled by Dr Day are quoted 
in 1907; among 2630 total surgical admissions the only congenital deformities 
were 5 hare-lips, 2 talipes, 2 imperforate anus, 1 extroversion of bladder; in 1908, 
2702 admissions, 3 hare-lips, 2 imperforate anus, 1 hypospadias, 1 undescended 
testicle, 1 meningo-encephalocoele. Capt. G. W. G. Hughes, R.A.M.C., in a paper 
to the Lancet, July 16, 1910, referring to this annotation, remarks “ Readers will 
be interested to hear that our too modern civilisation is innocent of this slur,” 
and goes on to shew that many congenital defects are by no means uncommon. 
Dealing with males between the ages of 14 and 21 years he gives the following 
figures : 

Hare-lip in 0°041 7%. 

Cleft-palate 0016. 

Polydactylism 0-058 '/ and 0:04 °%, in two series. 
Shortened metatarsal 0°37 7% and 0:23 % 

Other deformities of fingers and toes 0:22 7. 
Talipes 016 %. 

Among the thousands of ancient Egyptian bodies which Prof. G. Elliot Smith 
has unearthed and examined, a single case of cleft-palate was met with, a female 
of 20 years of age with a skull of negroid type, of between the 4th and 6th 
century B.c.; only one case of talipes (T. equinovarus) was recorded. 


It is obvious that in Egypt surgical treatment is not sought in cases of 
cleft-palate and rarely for other congenital defects but many of them are common 
enough. 


May the rarity of defects among the ancient peoples of Egypt be due to the 
same cause that acts in Nyasaland to-day? Were the children affected with 
deformities killed at birth and “thrown onto the dust-heap” where their remains 
were soon lost trace of? Of chief interest to me are the figures published by 
Captain Hughes. He shews that a shortening of the 4th metatarsal bone occurs 
in percentages rising to 0°37 of males examined. This defect is peculiarly common 
in this country. Again, polydactylism occurs in 0°05 °/, and other deformities of 
fingers and toes in 0:22 °/, of Egyptians, both deformities very frequently met 
with by myself in Nyasaland. 
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He however does not mention polythelia and polymazia nor helical fistula. 
I should be very interested to learn if this last insignificant anomaly was looked 
for. There is no doubt that one of them, helical fistula, occurs with a frequency 
in Nyasaland which cannot be rivalled by any other among peoples of any race. 
I think one may also say with certainty that the incidence of others (shortened 
4th metatarsal and polydactylism) in this country is far in excess of that among 
Europeans, though probably much about the same as in Egypt. 


Upon what hypothesis can these facts be explained? Is there a single cause 
or are there many at work? These are questions which I shall not attempt to 
enter into, but by simply recording my observations I shall hope to stimulate 
others to do the same, for only by accumulating facts can it be hoped that such 
problems will ever be solved. 


Biometrika, Vol. X, Part | Plate | 


(1) (2) 


Samuti, an Ateliotic Dwarf. Subgiant, Height 1:92 metres, with Wife and 
Albinotic Child. 


Etimu, aged 25, an Achondroplasic Dwart. 
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(5) (6) 


Masimosya, aged 19, Gynaecomastos, with other features which were formerly described as those of 
Partial Hermaphroditism. 


(8) 


Microcephalic Infant. [Case of Hydrocele testis included by an over- 
sight of Dr Stannus in the photographs, and 
engraved in consequence, Discovered too late 
to rearrange plates.] 
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(13) 


Son of Matikwiri, aged 7, a case of Scaphocephaly. 


(14) (15) 


Cases of Umbilical Hernia, Split Hand and Foot in a child, aged 5, 


be 
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(16) (17) 


Case showing faint medium depression Blantyre boy, aged 10, with Hare-lip. 
of upper lip. 


(18) (19) 


Young woman with two nipples on left Gobedi, ayed 32, with congenital Humeral 
breast. Micromely. 
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(20) Ndala, Split Hand, left only. (21) Chibisa, aged 30, elongation of all segments of middle finger, 
; and its metacarpal bone, 
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(22) Shortening of the fourth metatarsal bone. 


(24) Shortening of the left great toe. 
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TABLES OF POISSON’S EXPONENTIAL 
BINOMIAL LIMIT. 


By Ee it SOPER, “McA: 


In his treatise, Recherches sur la Probabilité des SJugements, Paris, 1837, 
Poisson* shows that the series of frequencies 


(Dean am Oct n(n =i) pg + + ee (DEK ip + 
2 atin nih Lr 


given by the expanded terms of the binomial 
(p aL qs 
becomes in the limit, when gq is diminished, and n increased, indefinitely, but so 
that nq remains finite and equal to m, the exponential series 
S me m 
e™(l+_m+o——+...+—+4...]3 
2! r! 
and he points out that the terms of this series will give the proportional 
frequencies of the occurrences 
OF pie ee ney 
times, in any sample, of an event, every occurrence of which is equally likely in 
the sample and independent of the other occurrences, and which is of such 
frequency that m events occur in the sample on an average. 


The series is arrived at by “Studentt,” when considering the theoretical 
frequencies in sample drops of a liquid of minute corpuscles supposed distributed 
at random throughout the mass of the liquid. 


The event may also occur in time, each occurrence being supposed to take 
place with equal probability in any finite period taken as the sample, and to act 
independently of the occurrences of all the other events. A physical example, 
which appears by the closeness of the observed to the theoretical frequencies to 


* pp. 205 et seq. 
+ Biometrika, Vol. v. p. 351, ‘‘On the Error of counting with a Haemacytometer,” 
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satisfy these conditions, is the number of a-particles discharged per {-minute or 
}-minute interval from a film of polonium *. 

In vital statistics the sample may be an individual or house or community and 
the event an accident or disease and so on. But it must be borne in mind that 
for such series as the above to be applicable the occurrence of one event in the 
sample must not preclude or influence in any way the occurrence of a second. 

The probability of « occurrences, m being the mean number, in a sample, is 

e-™ m*/x | 
and in the tables which follow this is evaluated for m= 0'1, 0°2... to 15:0 and for 


x =0,1, 2... up to such an integer as gives a figure in the sixth place of decimals, 
the number of places tabulated. 


The terms of the series were calculated, each by a fractional operation upon 
the preceding, beginning with the modal term and going both forward and back. 
Thus if m=7'6 the term e~*® x (7'6)'/7 ! was first calculated by tables of logarithms, 
and the succeeding terms were then obtained seriatim by the operations 

16 76 76 

meee ie eI) 
and the preceding ones by the operations 

ag eae etc 

06 5G" AEG ge 


done with a mechanical calculator, first a multiplication and then a division. 


tc., 


Seven places of decimals were thus calculated and the series is checked by the 
total, which differs from unity by the remainder (a figure in the eighth or later 
place of decimals in all the present cases) and the algebraical sum of the errors of 
seventh figure approximations. 


Poisson’s exponential series has been previously calculated to four places of 
decimals by L. von Bortkewitsch+ for values of m from 0:1 to 10:0. 


The present tables give the probability of each number of times of occurrence 
of the event. For the sums of these values, that is, the probability of occur- 
rence of the event, a given number of times or greater, or a given number of times 
or less, reference must be made to a second paper in this issue of Biometrikat, 
where such probabilities are calculated for integral values of m from 1 to 30. 


* See Rutherford and Geiger: ‘‘The Probability Variations in the Distribution of a-Particles,” 
Philosophical Magazine, Vol. xx. p. 700, 1910. See also EK. C. Snow, ‘‘ Note on the Probability Varia- 
tions, &c.,” Vol. xxi. p. 198, 1911, who finds the variance of experiment from theory to be such 
as would occur once in six experiments and once in three experiments respectively of the limited time 
taken, were theory exact. In a note to the first paper H. Bateman gives a proof of the exponential series 
of probabilities arrived at from considerations of this problem. 

+ Das Gesetz der kleinen Zahlen, 1898. A comparison of the table printed therein with the present 
table shows agreement except as to the fourth figure; the nearest fourth figure is not given, in rather 
many instances, in the tables of Bortkewitsch. 

+ Lucy Whitaker, B.Sc. ‘On the Poisson Law of Small Numbers,’ Vol. x. p. 37 et seq. 
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TABLE of e™m*/«!: General Term of Poisson's Exponential Expansion 
(“Law of Small Numbers”). 


m 
# | : | a ca : af 
0-1 o2. |, 03 04 | O08 0-6 0-7 0:8 09 | 1-0 
| | | 
0 | 904837 | ‘818731 °740818 | 670320 | °606531 | 548812 | -496585 | -449329 | 406570 | 367879 | 0 
1 | 090484 | "163746 °222245 | -268128  °303265 | -329287 | °347610 | -359463 | °365913  -367879| 1 
2 | 004524 | -016375 °033337 | 053626 | 075816 | -098786 | °121663 | 143785 | "164661 | -183940; 2 
8 | 000151 | -001092 003334 | -007150  °012636 | -019757 | -028388 | 038343 | 049398 -061313| 3 
4, | 000004 | -000055 = =*000250 | ‘000715 *O01580 | -002964 | 004968 | -007669 | °011115 | 015328 | 4 
i) — 000002 *000015 | 000057 | 000158 | -000356 | ‘000696 | -001227 | ‘002001 003066} 5 
6 — — ‘000001 | -000004 000013 | *000036 | *000081 | *000164 | 000800, -000511] 6 
ih — = ~ — | 000001 | 000003 | -000008 | 000019 | *000039 | 000073 | 7 
8 as = = | -- “000001 | *000002 | ‘000004 | “000009 | 8& 
9; ~ = - — _ — _ — — | 000001] 9 
| : - 
x Hes 1°2 1°3 Lp SN 1 1-6 HUSH 1°8 1:9 20 zy 
0 | 332871 | 301194 | -272532 | -246597 | -223130 | .201897 | 182684 | -165299 | °149569 | -135335| 0 
1 | 366158 | -361433 | -354291 | °345236 | °334695 | 3823034 310562 | -297538 | -284180 | -270671| 1 
2 | °201387 | -216860 | *230289 | *241665 | *251021 | .258428 | °263978 | -267784 | °269971 | *270671 | 2 
3 | 073842 | -086744 | 099792 | 112777 | *125510 | -137828 | 149587 | -160671 | 170982 | 180447 | 3 
4 | 020307 | -026023 | -032432 | 039472 | ‘047067 | 055131 | 063575 | 072302 | ‘081216 | 090224 | 4 
5 | 004467 | 006246 | -008432 | 011052 | °014120 | -017642 | ‘0216154 -026029 | 030862 | (036089 | 5 
3 | 000819 | °001249 | ‘001827 | 002579 | °003530 | -004705 | ‘006124 | -007809 | *009773 | ‘012030 | 6 
7 | 7000129 | :000214 | 000339 | °000516 | ‘000756 | ‘001075 | °001487 | -002008 | -002653 | 003437 | 7 
8 | 000018 | 000032 | -000055 | ‘000090 | ‘000142 | -000215 | 000316 | -000452 | -000630 000859 | 8 
9 | -000002 | :000004 | ‘000008 | :000014 | ‘000024 | -000038 | “000060 | -O00090 | -000133 | 000191 | 9 
10 = “000001 | ‘000001 | ‘000002 | ‘000004 | -O00006 | 000010 | -000016 | -000025 | -000038 | 10 
11 — — = — = ‘000001 | ‘000002 | 000003 | “000004 | 000007 | 11 
12 ae ee — — | — {| 000001 | 000001 | 12 
u 21 22 23 24 2:5 2°6 2:7 28 2:9 30 x 
| _ ie 
0 | °122456 | *110803 | *100259 | -090718 | :082085 | ‘074274 067206  -060810 | °055023 049787 | 0 
1 | :257159 | °243767 | 230595 | -217728 | -205212 | -193111 | 181455 | °170268 | 159567 | 149361] J | 
2 | 270016 | 268144 | -265185 | -261268 | 256516 | -251045 | -244964 | 238375 | 231373 224042 | 2 
3 | 189012 | °196639 | *203308 | 209014 | °213763 | °217572 | *220468 | -222484 | *223660 +224042| 3 
4 | 099231 | -108151 | *116902 | *125409 | -133602 | °141422 | 148816 | -155739 | 162154 -168031| 4 
5 | 041677 | -047587 | -053775 | 060196  -066801 | -073539 -080360 | 087214 | 094049 -100819| 4 
G | 014587 | 017448 | 020614 | ‘024078 | 027834 | 031867 | -036162 | -040700 | 045457 -050409 | G 
7 | 004376 | 005484 -006773 | -008255 | 009941 | 011836 | 013948  -016280  -018832 -021604| 7 
8 | 001149 | ‘001508  :001947 | 002477 | 003106 | 003847 | ‘004708  -005698 | ‘006827 -008102!) 8 
9 | 000268 | 000369  °000498 | ‘000660 | -000863 | ‘001111 | 001412 | -001773 | *002200 002701, 9 
10 | :000056 | 000081 | *G09114 | -000158 | :000216  -000289 | °000381 | -000496 | 000638  -000810 | 10 
11 | 000011 | 000016 | -000024 , -000035 | -000049  -000068 | -000094 | -000126 | ‘000168  -000221 | 11 
12 | :000002 | °000003 | -000005 | ‘000007 | 000010 | -000015 | ‘000021 | °000029 | ‘000041  -000055 | 12 
13 — 000001 | 000001 | ‘000001 | >000002 | ‘000003 | -O00004 | *000006 | 000009 = -000013 | 13 
14 — — | = | -- _- ‘QON0OT | *000001 | *OO00001 | 000002  -000003 | 14 
fon — —-| — | — — = = _ — 000001 | 15 
| 
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TABLE—(continued). 


| 11 | *000287 | -000368 | *000467 | 000587 | -000730 | -00090) | °001102 | -001337 | -001610 | 001925 
12 | ‘000074 | 000098 | °000128 | 000166 | -000213 | -090270 | 000340 | -000423 | -000523  -000642 
3% | ‘000018 | -000024 | *000033 | -000043 , -000057 | -000075 | ‘000097 | 000124 | -000157 | -000197 
14 | -000004 | -000006 | “000008 | -000011 000014 | 000019 | -000026 | -000034 | “000044 | -000056 
| 15 | 000001 | *O00001 | “000002 | -000002 | -000093 | -000005 | 000006 | *CO0009 | -000011 | -000015 


MM 
v ie eer ae — as 
81 32 3-8 ow) ass 36 Pil 3:8 39 40 

0 | 045049 | -040762 | -036883 | -033373 | -030197 | 027324 | -024724 | 022371 | 020242 | -018316| 0 

1 | *139653 | -130439 | 121714 | -113469 | -105691 | -098365 | -091477 | -085009 | -078943 | -073263| 1 

2 | 216461 | -208702 | *200829 | -192898 | -184959 | -177058 | -169233 | “161517 | 153940 | -146525| 2 

3 | 223677 | -222616 | -220912 | -218617 | -215785 | -212469 | 208720 | -204588 | -200122 | -195367| 3 

4 | 173350 | -178093 | 182252 | -185825 | -188812 | -191222 | -193066 | -194359 | -195119 | -195367| 4 

5 | ‘107477 | -113979 | *120286 | -126361 | -132169 | -137680 | 142869 | -147713 | 152193 | 156293) 5 

6 | 055530 -060789 | 066158 | 071604 | -077098 | -082608 | -088102 | 093551 | -098925 | 104196 6 

7 | 024592 | -027789 | -031189 | -034779 | 038549 | -042484 | 046568 | -050785 | -055115 | -059540| 7 

8 | 009529 | 011116 | 012865 | -014781 | -016865 | -019118 | -021538 | -024123 | -026869 029770! 8 

9 | 003282 | -003952 | °004717 | -005584 | -006559 | -007647 | -008854 | 010185 | -011643 | -013231| 9 

10 | -001018 | -001265 | ‘001557 | -001899 | -002296 | -002753 | 003276 | 003870 | 004541 | 005292 | 10 
1 
2Q 
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16¢)| = : — 000001 -000001 | -000001 | -000001 | -000002 , -000003 | -000004 | 16 
Ly =) ieee ee or ee = a “000001 | 000001 | 17 
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0 | 016573 | -014996 | °013569 | -012277 | -011109 | -010052 | ‘009095 | :008280 , -007447 | :006738 
1 | 067948 | -062981 | 058345 | °054020 | 049990 | -046238 042748 | °039503 | -036488 | -033690 
2 | *139293 | +132261 | 125441 | +118845 | -112479 | 106348 | -100457 | ‘094807 | -089396 | -084224 
3 | 190368 | +185165 | 179799 *174305 | -168718 | -163068 | °157383 | °151691 | -146014 | -140374 
4 | 195127 | 194424 | -193284 | 191736 | :189808 | -187528 | 184925 | +182029 | *178867 | °175467 
5 | 160004 | +163316 | °166224 | +168728 | 170827 | °172525 | -173830 | °174748 | -175290 | 175467 
G | *109336 | -114321 | +119127 | 7123734 | -128120 | -132270 | 136167 | 1139798 | 143153 | 146223 
7 | (064040 | -068593 | ‘073178 | ‘077775 | 082363 | 086920 | 091426 | 095862 | -100207 | *104445 
032820 | -036011 | 039333 | 042776 | 046329 | 049979 | -053713 | 057517 | ‘061377 | -065278 
9 | -O14951 | -016805 | 018793 °020913 | -023165 | 025545 | 028050 | °030676 | -033416 | -036266 
10 | -006130 | 007058 | 008081 | °009202 | -010424 | -011751 | -013184 | 014724  -016374 | 018133 
11 | -002285 | 002695 | -003159 | -008681 | -004264 | -004914 | -005633 , 006425 007294 | 008242 
12 | 000781 | -000943 | -001132 | 001350 | -001599 | 001884 | :002206 | 002570 -002978 | :003434 
3 | -000246 | -000305 | 000374 | 000457 | 000554 | -000667 | 000798 | -000949 001123 | -001321 
14 | 000072 | 000091 | -000115 | 000144 | -000178 | -000219 | 000268 | 000325 | 000393 | -000472 
15 | 000020 | -000026 | -000033 | -000042 | -000053 000067 | -009084 | -000104 -000128 | 000157 
16 -Q00005 | 000007 | 000009 | ‘000012 | -000015  -000019 | -000025 | 000031 -000039 000049 
17 | 000001 | -000902 | *000002 | *000003 | -000004  -000005 | -000007 | *000009 | *000011 | :000014 


7) 


WOANAAK OHS 


iS |) Ga — | 000001 | *000001 | -000001 | -000001 | -000002  *000002 - -000003 | -000004 
1S) = _— ces — = — — | 000001 | -000001 | *000001 
| | 
ole of Se Nord a4 55 56 a7 eee se ye) i) OO) x 


0 006097 | °005517 | 004992 | -004517 | -004087 | -003698 | 003346 | -003028 | 002739 | 002479 
031093 | ‘028686 | -026455 | °024390 | 022477 | ‘020708 | 019072 | -017560 | 016163 | -014873 
2 | -079288 | 074584 | 070107 | -065852 | -061812 | -057982 | :054355 | -050923 | -047680 | -044618 
3 134790 | 129279 | 123856 | 118533 113323 | *108234 | +103275 | 098452 | ‘093771 | 089235 | 


ES 
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TA BLE—(continued). 


m 


qr 
Le 
Or 
© 
nn 
aS) 
N 
s 
aT 
ial 
OU 
fo 
~t 
~ 
oa) 
Ne) 
QD 
S 


171857 | *168063 164109 | -160020 | *155819 | *151528 | 147167 | °142755 | 138312 | 133853) 4 
175294 | 174785 173955 | -172821 | 171401) -169711 | 167770 | -165596 | *163208 | *160623 5 
*149000 | °151480 | -153660 | 155539 | °157117 ) °158397 | *159382 | -160076 | 160488 | 160623) 6 
“108557 | °112528 | -116343 | -119987 | °123449 126717 | 129782 | +132635 | °135268 1387677) 7 | 
069205 | 073143 -077077 | -080991 | ‘084871 -088702 | 092470  -096160 | ‘099760 *103258 8 

039216 | ‘042261  -045390 | 048595 | °051866 | -055192 ‘058564 | -061970 | °065398 | 06838 | 9 
“020000 | 021976 024057 | -026241 | °028526 -030908 | °033382 | -035943 | 038585 | 041303, 10 
-009273 | °010388 -011591 | -012882 | °014263  -015735 017298 | 018952 | -020696 | 022529 | 71 
003941 | ‘004502 -005119 | -005797 | °006537  -007343 | -008216 | 009160 | 010175 | ‘011264 | 12 
001546 | 001801 | 002087 | 002408 | °002766  -003163  -003603 | -004087 | 004618 | °005199 | 13 

‘000563 | -000669 | -000790 | -000929  -001087  -001265 | 001467 | -001693 | °001946 | "002228 | 14 
‘000191 | 000232 | -000279 | -000334 000398 000472 | 000557 -000655 | ‘000766 000891 15 
000061 | ‘000075 | -000092 | -000113 | *000137 | -000165 | 000199 | -000237 | *000282 | ‘000334 16 
000018 | -000023 | -000029 | -000086 | ‘000044 | -000054 | ‘000067 | 000081 | ‘000098 | ‘000118 | 17 
“000005 | ‘000007 | ‘000008 | :000011 | ‘000014 | -G00017 | 000021 | -000026 | -000032 | ‘000039 | 18 


000001 | 000002  -000002 | -000003 | -000004 | -000005 | -000006 | -000008 | -000010  -000012 | 19 
= — 000001 000001 | 000001 | -000001  -000002 | -000002 | 000003 | -000004 | 20 
= = —- ; — ar — -= | :Q00001 | *O00001 | *OO0001 | 21 
6'l 6°2 63 Or4 6'5 O°6 oy 6°8 Org 70 we 


002243 | -002029 | -001836 | °001662 | ‘001508 | -001360 | -001231 | -001114 | -001008 | ‘000912 | 0 
013682 | °012582 | -011569 | -010634 | 009772 | ‘008978 | °008247 | 007574 | °006954 | ‘006383 | 1 
041729 | :039006 | :036441 | °034029 | 031760 | 029629 | 027628 | -025751 | -023990 | 022341 | 2 
084848 | ‘080612 | -076527 | °072595 | ‘068814 | 065183 | 061702 | -058368 | -055178 | 052129 | 3 
129393 | °124948 | :120530 | -116151 | °111822 | -107553 | :103351 | (099225 | -095182 | ‘091226 | 4 
"157860 | °154936 | *151868 | -148674 | °145369  +141969 | 138490 | 134946 | -131351 | 127717) 6 
"160491 | °160100 | -159461 | +158585 | °157483 | -156166 | °154648 | 152939 | 151053 | -149003 | 6 
139856 | *141803 | *143515 | 144992  *146234 | -147243 | :148020 | 148569 | °148895 | *149008 | 7 
"106640 | -109897 | :113018 | °115994 | ‘118815 | °121475 | -123967 | °126284 | -128422 | 130377] $& 


072278 | ‘075707 | :079113 | -082484 | 085811 | 089082 | 092286 | -095415 | 098457 | 101405 | 9 
044090 | 046938 | -049841 | -052790 | 055777 058794 | 061832 | 064882 | 067935 | ‘070983 | 10 
024450 | 026456 028545 | :030714 | °032959 | 035276 | -037661 | -040109 | -042614 | “045171 | 11 
012429 | ‘013669 | 014986 | :016381 | °017853 | -019402 | -021028 | -022728 | -024503 | 026350 | 12 
005832 | 006519 | -007263 | ‘008064 | °008926 | -009850 | :010837 | -011889 | -013005 | 014188 | 13 
002541 | ‘002887 003268 003687 | (004144 -004644 | -005186 | 005774 | ‘006410 ‘007094 | 14 
001033 | °0011938  -001373 | -001573 | 001796  -002043 | -002317 | 002618 | -002949 003311 | 14 
‘000394 | :000462 | -000540 | -000629 | 000730 | -000843 | -000970 | 001113 | 001272 *001448 | 10 
000141 | ‘000169 | 000299 | 000237 | ‘000279 *000327 | *000382 | -000445 | ‘000516 | ‘000596 | 17 
000048 | -000058 | 000070 | ‘000084 | ‘000101 000120 | :000142 | ‘000168 | 000198 -000232 | 1S 
‘000015 | *000019 | -000023 | -000028 | -000034  -000042 | -000050 | -000060 | -000072 ‘000085 | 19 
000005 | ‘000006 | -000007 | -000009 | °000011  -000014 | ‘000017 | -000020 | 000025 000030 | 20 
000001 | *090002 | -000002 | -000003 | *000003 | -000004 | 000005 | -000007 | ‘000008  -000010 | 271 

= -— | -000001 | 000001 | -000001 ‘000001 | *000002 | -000002 | -000003  -000003 | 2.2 

= = | -—- —- | — | = — 000001 | 000001 000001 | 22 


q 
q 


30 Poisson's Exponential Binomial Limit 
TABLE—(continued). 
™m 
av ax 
Hi | 7:2 13 Th 1s} v6 at 78 79 8-0 
0 | :000825 | 000747 | -000676 | -000611 | 7000553 | -000500 000453 | 000410 | -000371 | 000335 | 0 
1 | :005858 | ‘005375 | 004931 | 004523 | 004148  -003803 | (003487 | -003196 | 002929 | 002684] 1 | 
2 | 020797 | 7019352 | -018000 | 016736 | ‘015555 | 014453 | (013424 | -012464 | 011569 | 010735 | 2 
3 | 049219 | 046444 | 043799 | 041282 | 038889 036614 | 084455 032407 -030465 | 7028626] 3 
4 | (087364 | -083598 | -079934 | 076372 | (072916 069567 | 066326  -063193 060169 | 057252 | 4 
5 | 124057 | 120382 | +116703 | -113031 | °109375 105742 | -102142 -098581 ‘095067 | 091604} 5 
6 | 146800 | "144458 | 141989 | :139405 | 136718 °133940 | "131082 | -128156 125171 | *122138| 6 
7 | *148897 | *148586 | *148074 | -147371 | °146484 | °145421 | 144191 | -142802 | -141264 | 139587] 7 
8 | 132146 | -183727 | °135118 | +136318 | ‘137329 | -138150 | *138783 | -139232 | -139499 | -139587| 8 
9 | *104249 | *106982 | 109596 | *112084 | *114440 | °116660 | 118737 | -120668 | -122449 | 124077] 9 
10 | ‘074017 | 077027 | 080005 | -082942 | 085830 | ‘088661 | °091427  -094121 | 096735 | -099262 | 10 
11 | ‘047774 050418 | °053094 -055797 | °058521 | °061257 063999 -066740 | 069473 | -072190) 11 
12 | 028267 | -030251 | ‘032299 | -034408 | 036575 | 038796 041066 | -043381 | 045736 | -048127 | 12 
3 | :015438 | 016754 018137 019586 | ‘021101 | -022681 | °024324 -026029 | 027794 | 029616 | 13 
14 | 007829 | 008616 009457 | -010353 | 011304 | -012312 | 013378 -014502 | -015684 | 016924 | 14 
5 | 003706 -004136 004603 -005107 | *005652 006238 ‘006867 -007541 | 008260 | -009026 | 15 
16 | 001644 001861 002100 002362 | 002649 002963 | °003305  -003676 | -004078 | 004513 | 16 
17 | -000687 | -000788 | ‘000902 | °001028 | 7001169 | ‘001325 | 001497 | -001687 | -001895 | 002124 | 17 
18 | -000271 | 000315 | -000366 | -000423 | 000487 | *000559 | 000640 | 000731 | *000832 | 000944 | 18 
19 | -000101 | °000119 | *000141  -000165 | 000192 °000224 | *000259  -000300  *000346 | 000397 | 19 
20 | 000036 -000043 | 000051 | 000061 | 000072 | 000085 | 000100 | -000117 | 000137 | 000159 | 20 
21 | 000012 -000015  *000018 | 000021 | °000026 | -000031.| *000037 000043 | -000051 | -000061 | 21 
22 | 000004 | -000005 -000006 | -000007 | 000009 -000011 | *000013  -000015 | -000018 | -000022 | 22 
23 | 000001, -000002  *000002 | -000002 | “000003 , 000004 | *000004 | -000005 ‘000006 | 000008 | 23 
24 = = ‘000001 | -000001 | 000001  *000001 | 000001 -000002 | -000002 | *00C003 | 2 
25 = 23 = ae = mn — 000001 | 000001 | -000001 | 25 
= | ae raw 
x | 8 8-2 8:3 Sy 85 SiG) ile oa 88 | 89 90 | «x 
0 | 000304 | -000275 | 000249 | *000225 | 000203 | 000184 | -000167 | 000151 | 000136 | -000123 | 0 
1 | -002459 | ‘002252 ‘002063 | ‘001889 | -001729 | -001583 | -001449 | 001326 | 001214 | OOL111) 1 
2 | 009958 -009234 008560 | ‘007933 | 007350 006808 -006304 -005836  -005402 | 004998 | 2 
3 | 026885 | 025239 | 023683 | 022213 | 020826 | 019517 | ‘018283 | 017120 | -016025 | ‘014994 | 3 
4 | 054443 | 051740 049142 046648 | 044255 -041961 039765 | 037664 035656 | -033737 | 4 
5 | 088198 | 084854 081576 | 078368 | ‘075233 | 072174 069192 | -066289 | 063467 | -060727 | 5 
6 | 119067 | "115967 | "112847 | °109716 | "106581 *103449 | -100328 | 097224 | 094143 | 091090 | 6 
7 | 137778 snes 1305 “a105) "129419 | -127094 | °124693 | -129224 | 119696 | °117116| 7 
8 | -139500 | *139244 | -138823 | "138242 | 137508 136626 | 135604 | -134446 | 133161 | 131756 | 8 
9 | 125550 | *126866 | 128025 | "129026 | *129869 | -130554 *131084 | "131459 | "131682 | -1381756| 9 
10 | -101696 | *104031 | 106261 | *108382 | °110388 | -112277  *114043 | -115684 | *117197 | 118580) 10 
11 | :074885 | ‘077550 | ‘080179 | ‘082764 | ‘085300 | -087780 | °090197 | 092547 | °094823 | 097020 | 11 
12 | :050547 | -052993 | -055457 | °057935 | 060421 062909 | -065393 | -067868 | °070327 | °072765 | 12 
13 | 031495 | ‘033426 | 035407 | ‘037435 | °039506  -041617  °043763 , 045941 | -048147 | -050376 | 13 
14 | -018222 | 019578 | 020991 | (022461 | ‘023986 | 025565 -027196 | -028877 | 030608 | 032384 | 14 
15 | -009840 | 010703 | 011615 | °012578 | 013592 | -014657 ‘015773 | 016941 | -018161 | ‘019431 | 15 
16 | -004981 | 005485 -006025 | °006604 | °007221 | -007878 | 008577 | -009318 | 010102 | °010930 | 16 
17 | -002373 | 002646 002942 | 003263 | 003610  -003985 004389 | -004823 | ‘005289 | ‘005786 | 17 
18 | -001068 | 001205  -001356 | °001523 | 001705 | -001904 | 002121 | -002358 | 002615 | -002893 | 18 
19 | -000455 | -000520 | -000593 | °000673 | 000763 | -000862 | 000971 | -001092 | °001225 | 001370) 19 
20 | 000184 ee ia ‘000283 | -000324 | -000371 | 000423 | 000481 | -000545 | ‘000617 | 20 


H. KE. Soper 31 
TABLE—(continued). 
me 
xv rs Bo 
Sl 82 S38 Sh Sb S'6 Ont ss SD | 9:0 
21 | 000071 | ‘000083 | ‘000097 | ‘000113 -000131 | °000152 | ‘O00L75 | *000201 | *000231 | ‘000264 | 27 
22 | -000026 | ‘000031 | -000037 | ‘000043 | -000051 | ‘000059 | 000069 | ‘000081 | 000093 | *000108 | 22 
28 | 000009 | -000011 | ‘000013 | ‘000016 | 000019 | 000022 | *000026  -000031 | -000036 | 000042 | 23 
24 | 000003 | -000004 | 000005 | ‘000006 | -000007 | ‘000008  -009009 | ‘000011 000013 | ‘000016 | 24 
25 | -000001 | 000001 | ‘000002 | :000002 ‘000002 | *000003 | -000003 | °000004 | 000005 | ‘000006 | 25 
| 26 — — —- 000001 | 000001 | *O00001 | “000001 | *QOQ0OL1 | *O00002 | -000002 | 26 
2Y = — — a = — — ‘OOO0O1 | ‘OOO0001 | 27 
Da Oot 9°2 Fs a4 hes) IO Oey 9°8 99 10°0 4B 
| sons. 
0 | 000112 | 000101 | ‘OO0091 | -000083  -000075 | ‘000068 | *OO0061 | *000055 | *000050 | ‘000045 | U 
1 | 001016 | -000930 | ‘000850 | :000778 | ‘000711 | 000650 | ‘000594 | -000543 | 000497 | :000454 I 
2 | 004624 | 004276 | 003954 | 003655 | -003378 | 003121 | (002883  -002663 | 002459 | 002270 g 
3 | 014025 | 013113 | °012256 | 011452 | ‘010696 | °009987 | *009322 | ‘008698 | ‘008114 | ‘007567 3 
4, | 031906 | 030160 | 028496 | -026911 | -025403 | :023969 | ‘022606 | °021311 | 020082 | 0189174 
5 | 058069 | -055494 | -053002 | 050593 | -048266 | 046020 | 043855  °041770 | 039763 | °037833 | 5 
6 | °088072 | -085091 | °082154 | °079262 | ‘076421 | -073632 ‘070899 | °068224 | *065609 | °063055 6 
7 | 114493 | -111834 | *109147 | +106438 | °103714 | *100981 | ‘098246 | -095514 | °092790 | ‘090079 7 
8 | 130236 "128609 | °126883 | °125065 | 123160 | °121178 | °119123 | *117004 | *114827 | *112599 8 
9 | 131683 | °131467 | *131113 | 130623 | -130003 | 129256 | °128388 | *127405 | °126310] 1251109 
10 | *119832 | *120950 | °121935 | *122786 | °123502 | °124086 | °124537 | °124857 | -°125047 | -125110 10 
11 | :099188 | -101158 | *103090 | *104926 | -106661 | *108293 | ‘109819 | *111236  *112542 | 113736 11 
12 | ‘075176 | ‘077555 | ‘079895 | -082192 | -084440 | 086634 | ‘088770 | -090848 | °092847 | 094780 12 
18 | 052628 | :054885 | ‘057156 | 059431 | -061706 | (063976 | ‘066236 | 068481 | ‘070707 | (072908 13 
14 | °034205 | :036067 | °037968  -039904 | ‘041872 | °043869 | °045892 | 047937 | “050000 | °052077 14 
15 | 020751 | 022121 | °023540  *025006 | 026519 | 028076 | ‘029677 | ‘031319 | ‘033000 | ‘034718 15 
16 | °011802 | 012720 | ‘013683  -014691 | -015746 | -016846 | ‘017992  -019183 | °020419 | -021699 16 
17 | (006318 | -006884 | 007485 | 008123 | :008799 | 009513 | -010266 | -011058 | °011891 | ‘012764 17 
18 | 003194 | 003518 | -003867 004242 | 004644 | 005074 | 005532 | ‘006021 | ‘006540 | ‘007091 18 
19 | (001530 | -001704 | 001893 ‘002099 | 002322 | -002563 | °002824 | °003105 | °003408 | 003732 | 19 
20 | :000696 | 000784 | ‘000880 | 000986 | -001103 | -001230 | *001370 | °001522 | *001687 | 001866 | 20 
21 | 000302 | 000343 | ‘000390 000442 -000499 | 000563 | 000633 | 000710 | *000795 | ‘000889 | 21 
22 | 000125 | 000144 | °000165  *O000189 | 000215 | 000245 000279 | *000316 | ‘000358 | -000404 | 22 
23 | 000049 | -000057 | ‘000067 = “000077 000089 | -000102  *000118 | *000135 | 000154 | -000176 | 23 
24 | 000019 | -000022 | ‘000026  -000030 | 000035 | ‘000041 | *000048 | *000055 | ‘000064 | -000073 | 24 
| 25 | ‘000007 | -000008 | 000010 | -O00011 | *000013 | ‘000016 | “000018 | ‘000022  *000025 | -000029 | 25 
| 26 | 000002 | -000003 | *000003 | :000004 | ‘000005 | ‘000006 | *O00007 | *O00008 000010 | 000011 | 26 
27 | "000001 | 000001 | *OOO001 | *O00001 | *000002 | -000002 | *O00002 | "000008 *000004 | 000004 | 27 
28 — —- | = — 000001 | ‘000001 | ‘000001 | 000001 000001, -000001 | 28 
QO — = —- | — ee 000001 p22 
@ 10:1 10°2 L0°3 L104 L0°5 106 L0°7 L0°8 KORY) | 110 7 
0 | -000041 | :000037 | -000034 -000030 -000028 | -000-25 | ‘000023 | -000020 | -000018 | *OO0017 0 
1 | 000415 | *000379 | 000346 | 000317. °000289 | -000264 | -000241 | :000220 | 000201 | ‘000184 7 
2 | 002095 | 001934 | 001784  -001646 | -001518 | :001400 | 001291 | -001190 | -001097 ; 001010 | 2 
3 | 007054 | 006574 | °006125 | -005705 | 005313 | 004946 | 004603 | 004283 | -003984 | *008705 | 3 
1 | | | 


32 Powssovs Exponential Binomial Limit 
TABLE—(continued). 
UD 
x = ine ey ] BL 
10°1 10:2 10°3 10"4 | 10°5 | 10°6 L0°7 10°S 10:9 | 11:0 
| 
—|- ——|——|——— | _—— 
4 | O17811  -016764 | -015773 | 014834 | -013946 | -013107 | -012313 | -011564 | 010856 | -010189 |] 4 
5 | 085979 | °084199 | °032492 | 030855 | *029287 | ‘027786 | °026350 | 024978 | 023667 | 022415 | 5 
6 | °060565 | -058139 | (055777 | 053482 -051252 049089 | 046991 | -044960 | 042995 041095 | 6 
7 | 0873887 | 084716 | 082072 | 079458  °076878 | (074334 | 071830 | 069367 | 066949 064577, 7 
8 | °110326 | *108013 | *105668 | 103296  *100902 | -098493 | ‘096072 -093646 | (091218 | ‘088794 | 8 
9 | °123810 | 122415 | °120931 | -119364 “117720 | *116003 | °114219  -112375 | -110475 °108526; 9 
10 | 125048 | °124863 | °124559 | 124139 +123606  +122963 | -122215 | -121365 | -120418 | 119378 | 70 
17 | 114817 | *115782 | °116633 | °117368 | -117987 | *118492 | -118882 | *119159 | -119323 | -119378 | 71 
12 | 096637 | 098415 | *100110 | °101719 | *103239 | :104667 | *106003 | +107243 | :108386 | 109430 | 12 
13 | ‘075080 | ‘077218 | ‘079318 | -081375 | ‘083385 | °085344 | ‘087248 | -089094 | 090877 | °092595 | 13 
14 | °054165 | °056259 | 058355 | *060450 | -062539 | °064618 |} 066683 | °068730 | ‘070754 | 072753 | 14 
15 | °0B6471 | °038256 | 040071 | 041912 | °043777 | °045663 | 047567 | -049485 | 051415 053352 | 15 
16 | °023022 | 024388 | 025795 | °027243 | °028729 | °030252 | -031810 | 033403 | 035026 ‘036680 | 76 
17 | (013678 | 014633 | °015629 | 016666 | ‘017744 | °018863 | 020022 | -021220 | 022458 | 023734 | 17 
18 | :007675 | *008292 | ‘008943 | -009629 | -010351 | °011108 | -011902 | :012732 ‘013600 | 014504 | 78 
19 | ‘004080 | 004451 | *004848 | *0054%71 | -005720 | -006197 | °006703 | -007237 | 007802 | -008397 | 19 
| 20 | °002060 | -002270  :002497 | -002741 003003 | 003285 | -003586 | 003908 | -004252 | 004618 | 20 
21 | 000991 | °001103 | *001225 | ‘001357 | *001502 | °001658 | -001827 | °002010 | 002207 | ‘002419 | 21 
22 | 000455 | -000511 | 000573 | 000642 | 000717 | 000799 | :000889 | -000987 | -001093 | °001210 | 22 
23 | -000200 | -000227 | -000257 | 000290 -000327 | -000368 | 000413 | 000463 | -000518 | ‘000578 | 23 
24 | 000084 -000096 | -000110 | 000126 -000143 | 000163 | -000184 000208 | -000235  -000265 | 2 
25 | 000034 | 000039 | *000045 | *000052 , 000060 | *000069 | ‘000079 | -000090 | 000103 | ‘000117 | 25 
26 000013 | 000015 *000018 | -000021 | 000024 -000028 | -000032 000037 | -000043 | 000049 | 26 
27 | 000005 | 000006 | 000007 | *000008 | 000009 | :000011 | *000013 | :000015 | 000017 | *000020 | 27 
28 000002 | -000002 | *0000038 | *000003 | -000004 | *000004 | :000005 | *000006 | -000007 | -000008 | 28 
29 | 000001 | ‘000001 | -000001 | °000001 | 000001 | 000002 | *000002 | -000002 | -000003 | 000003 | 29 
3O — _ — a — 000001 | ‘000001 | -000001 | *000001 | ‘000001 | 30 
a let ED 113 114 TAG 5 = eto Mie 11°8 11:9 12°0 iz. 
0 | 000015 | 000014 | ‘000012 | ‘000011 | ‘000010 | *000009 | °000008 | °000008 | *000007 ‘000006 |} 0 
Z | -000168 | °000153 | 000140 | 000128 | ‘000116 | *000106 | ‘000097 | -000089 | ‘000081 | ‘000074 | 7 
2 | 000931 | -000858 | 000790 | 000727 | 000670 | ‘000617 | 000568 | 000522 | 000481 | 000442 | 2 
3 | 003445 | -003202 | 002976 | 002764 | ‘002568 | 002385 | -002214 | -002055 | 001907 | ‘001770 | 3 
4 | 009559 | 008965 | -008406 | 007879 | 007382 | 006915 | 006476 | 006062 | -005674 | 005309 | 4 
5 | °021221 | 020082 | -018997 | 017963 016979 | °016043 | 015153 | ‘014307 | -013504 | ‘012741 | 5 
G | ‘039259 | °037487 | -035778 | 7034130 | 032544 | -031017 | °029549 | 028137 | -026782 | 025481 6 
7 | 062253 | -059979 | -057755 | 055584 | 053465 | 051400 | 049388 | -047432 | 045530 | 043682 | 7 
8 | 086376 | °083970 | 081579 | -079206 | ‘076856 | 074529 | 072231 | -069962 | 067725 | ‘065523 | 8 
9 | 106531 | -104496 | 102427 | -100328 | 098204 | 096060 | -093900 | °091728 | 089548 | ‘087364 | 9 
10 | “118249 | -117036 | 115743 | 114374 | +112935 | *111430 | *109863 | 108239 "106562 | °104837 | 10 
11 | 119324 | +119164 | °118899 | °118533 | -118068 | *117508 | °116854 | *116110 | -115281 | 114368 | 11 
12 | *110375 | 111220 | +111964 | *112607 | *113149 | °113591 | +113933 | -114175 | 114320 | *114363 | 72 
13 | °094243 | 095820 | -097322 | 098747 | -100093 | °101358 | -102539 | -103636 | -104647 | 105570 | 13 
14 | 074721 | (076656 | -078553 | -080409 | -082219 | 083982 | 085694 | :087350 | -088950 | 090489 | 14 
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ON THE POISSON LAW OF SMALL NUMBERS. 


By LUCY WHITAKER, BSc. 


PART I. THEORY AND APPLICATION TO CELL-FREQUENCIES. 


(1) Introductory. 


Let p denote the probability of the happening of a certain event A, and 
q = 1-—p, the probability of its failure in one trial. Then it is well known that 
the distribution of the frequencies of occurrence n, n — 1, n — 2... times in a series 
N of n trials is given by the terms of the point binomial 


VG @ eo a/) MMH rMn Raa hone yoAusdcbbooodote. (1). 
The fitting of point-binomials plotted on an elementary base c to observed 
frequency distributions has been discussed by Pearson*, and he has indicated that, 
if c be unknown, the problem can be solved in terms of the three moment coefficients 
Hy x, M4 required to find c, p and n. In actual practice but few cases of frequency 
ean be found which are describable in terms of a point-binomial, and of these few 
a considerable section have n negative, p greater than unity and q negative; thus 


defying at present interpretation, however well they may serve as an analytical 
expression of the frequency. . 


The hypothesis made in deducing the binomial (p+ q)” as a description of 
frequency is clearly that each trial shall be absolutely independent of those which 
precede it. In this respect it may be said that binomial frequencies belong to the 
teetotum class of chances, and not to those of card-drawings, when each drawing 
is unreplaced. In the latter case the “contributory cause groups are not inde- 
pendent,” and our series corresponds to the hypergeometrical rather than to the 
binomial type of progression ft. 

Using the customary notation 8, = p;°/"s°, Bo = M4/M, the binomial is determined 
from : 

n=2/{8—B.+B}, c=ov6—28,432, 
Pq =$ 8B — B+ B,)/(6 — 28, + 38;) 


* «Skew Variation in Homogeneous Material,” Phil. Trans. Vol. 186, A, p. 347, 1895. 
+ Phil. Trans. Vol. 186, A, p. 381, 1895. 
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In order that n should be positive, it is needful that 
3—-B,+ 8, =4(6—28,+ 28), 


should be positive. If this is satisfied clearly c will be real because 8, is always 
positive. Further then 
1 6— 28,4 28, 

Ree) eS Goa eae) 
is always less than a quarter and p and q will therefore be real. If the reader 
will turn to Rhind’s diagram, Biometrika, Vol. vit. p. 131, he will see that the line 
3—6.+ 8,=0 cuts off all curves of Types III, IV, V and VI, and includes a 
portion only of Type I, with a part of its U and J varieties. The binomial 
description of frequency, therefore, is not—considering our experience of frequency 
distributions—likely to be of very universal application. 


(2) Further Linutations. 


Now let us still further limit our binomial by supposing : 


(i) that the unit of grouping of the observed frequencies corresponds to the 
actual binomial base unit c and (ii) that the first of the observed frequencies 
corresponds to the term Np” of the binomial*. 


In this case the mean m of the observed frequency measured from the first 
term of the frequency will be equal to the nq of the binomial and the standard 
deviation of the observed distribution will be equal to Vnpg. We have thus: 

Doo KG — LC, TN —O) sc cn cda ete nscrwes: (111) 
and n and q will both be negative, if m be less than o% The condition for a 
positive binomial is therefore that o be less than 4/m. 

(3) Probable errors of the constants of a Binonual Frequency. 

It is desirable to find the probable errors of p and n as determined by these 
formulae. We have: 

pa = nq, M2 = MPs 
buy’ = Gqdn+ nog, Sp. = pon + ngdp + npog, 
assuming deviations may be represented by differentials. 
Hence, since dp = — dq: 
Ou. —(p — q) du, = Gon and pdm,’ — bu. = nqdq. 
Square each of these results, sum for all samples and divide by the number of 
samples, and we have: 
Oe at qy Cie 2(p—-q) C5 Oi) City iy Gon 
2 oot Vi ,=Nn20"%o,2. 
Oe HD fay! 270 4% p, a Hy fg 

* The exact nature of these limitations must be fully appreciated. The best fitting binomial to 
a given frequency distribution will usually be far from one in which the first term of the binomial 
corresponds to the first observed frequency. The modes of the binomial and the observed frequency 


will closely correspond, but the ‘‘tails” of the binomial may be quite insignificant and correspond to no 
observed frequencies. 
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Now (ai is the standard deviation of variations in ~. and therefore 
oO", = (fa — pe’)/N. 
Similarly oy! is the standard deviation of variations in the mean and therefore 


Lo Heo sures t f 1 ween 
o*,/=Ms/N. Lastly the product o,,0,/7,,,,' measures the correlation betwe 


deviations in #, and yp,’ and is known to be p,/N*. 
Thus we have: 


1 
eer ee Bo pay q)? fa = 2(p—@) pst, 


wePay = + {a — Pa? + Pe — pps}. 


Butt fy = npg {1 +3(n—2) pg}, | 


een Arrs bh 00 (iv). 
= mpg(p-G, He = npg | 
Whence after some purely algebraical reductions we deduce: 
n p ad 
r= >= Pe el aN | erm Aan (bet |) | adodoccdcous00s ' 
° VNY (1- 4) - me A) aes 2(1- *) ©) 
we pasl E noe Sp . 
Op) = Oy vv 2 ae ip Cae aaa (vi). 


Formulae (v) and (vi) are very important; they enable us to obtain the 
probable errors for x and p when a binomial limited in the present manner is 
fitted to a frequency distribution. 


We see at once, that as n grows large and q grows small 
0, =o, approaches the limit V2/N, 


or the probable error, 67449 /2/N, of p and q is finite. But o® being finite op 
becomes infinitely great, or the probable error of n indefinitely large. Thus when 
the n of the binomial is very large, g being very small, the probable error of its 
determination is so great that its actual value is not capable of being found 
accurately. Again, suppose V embraced 200 observations, the probable error of q 
would be of the order ‘07; if NV corresponded to only eighteen observations, then 
the probable error of q would be of the order ‘22. It is clearly wholly impossible 


* Biometrika, Vol. 11. ‘‘On the Probable errors of Frequency Consiants,” see p. 275 (iv), p. 276 (vii), 
and p. 279 (xii). 
_ + Phil. Trans. Vol. 186, A, p. 347, 1895. 
+ There is no difficulty in obtaining the probable errors of n and p from the more general values 
in (ii). In this case 


t= 0No%g +o Be 208, 78,7B,p,» 


Cp=oq= es Ve? ,$42B,"0°2, — 2B, og %R,7,2,° 


The values of TB.» Fg, and "g,p, tor different values of 8, and B, have been tabled by Rhind, Biometrika, 
Vol, vit. pp. 136—141., 
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from series of observations even of the order 200, much less of order 18, to assert 
that g is or is not really a “small quantity.” Thus the observed value of q corre- 
sponding to a population of extremely small q might easily show g=°15 to °50!. 


(4) Poisson—Law of Small Numbers. 


A last limitation of the point-binomial is made by supposing the mean m = nq 
to remain finite, but g to be indefinitely small. We write : 


We 


N(p+ qr =N(Q-g4qr=N-gt (1+ 4) 


me me 


=N(1-q)? (14+)! nearly 


Ne aa (1 +m So + ei + a) ; 

Here the successive terms give the frequency of occurrence of 0, 1, 2, 3... 
successes on the basis of each success not being prejudiced by what has previously 
occurred. This is the Law of Small Numbers. It was first published by Poisson 
in 1837*. It was adopted later by Bortkewitsch, who published a small treatise 
expanding by illustrations Poisson’s work+. The same series was deduced later 
by “Student” in ignorance of both Poisson and Bortkewitsch’s papers, when 
dealing with the counts made with a haemacytometert. 


The mean is at m from the first group, the other moments as “Student” has 
shewn § are: 
f,.=™M, pg=mM, py =3m?4+m. 
Hence B,=1/m, B,.-—3=1/m. 


When the mean value is large, 8,, 8, and the higher §’s approach the values 
given by the Gaussian curve. 


Clearly the Poisson-Exponential formula contains only the single constant 
m = p, and its probable error is therefore °674490//N = 67449 m This will, 
if V be reasonably large and m not too big, be a small or at any rate a finite 
quantity (i.e. not like o, for g very small). Hence it might be supposed, although 
erroneously, that the Poisson-Exponential formula was capable of great accuracy 
in addition to its great simplicity. But this is to neglect the fundamental 
assumptions on which it is based, namely: 


(i) that the data actually correspond to a binomial, 
(ii) that in that binomial g is small and n large. 


Clearly (i) shows us that, if we can find the binomial, it will actually be closer 
to the observed frequency than Poisson’s merely approximate formula. 


* Recherches sur la Probabilité des Jugements. Paris, 1837, pp. 205 et seq. 

+ Das Gesetz der kleinen Zahlen, Leipzig, 1898. 

£ “On the Error of Counting with a Haemacytometer,” Biometrika, Vol. v, pp, 351—5, 1907. 
§ They may be deduced at once from (iv). 
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Secondly (11) can only be justified as an assumption by actually ascertaining 
the form of the binomial from the data and testing whether n is large and q small 
and positive. It appears absurd to base our formula on an approximation to 
a binomial of a particular kind when, on testing in the actual problem, such a 
binomial does not describe the results. As a merely empirical formula, the 
Poisson-Exponential of course can be tested by the usual processes for measuring 
goodness of fit, but no such test nor any discussion of the probable errors of their 
results have been provided by Bortkewitsch himself nor by Mortara, who has 
followed recently his lines in a work to be considered Jater. Asa matter of fact in 
the cases dealt with by Bortkewitsch, by Mortara and by “Student,” n will be found 
almost as frequently small and negative as large and positive, and q takes a great 
variety of values large and negative and large and positive, as well as small 
and positive. Thus the initial assumptions made from which the “law of small 
numbers” is deduced are by no means justified on the material to which it has so 
far been applied. 


(5) Application of the Law of Small Numbers to determine the Probable Errors 
of Small Frequencies. Given a distribution of frequency for a population NW let ny 
be the frequency in the cell of the sth row and éth column of a contingency table 
(or if we drop t, 7, would stand for the frequency of any class). Then if we take 
a random sample of WV individuals from this population, the chance that an indivi- 


dual is taken out of the 7, cell is fis/|N, and that it is not is 1— 7: Therefore if 


the original population be so large that the withdrawal of an individual does not 
affect the next draw, the frequency of individuals in M random samples of WV will 
be given by the terms of the binomial : 


M {( 1") + ke 


Now, if jig/N be very small, and WN large this will approximate to the 

Poisson series : 
Me (1 +m+ stat =) : 

where m == x V. But 7y/N will approximately be the mean proportion of the 
whole in the st cell of the sample itself =ny/N, or m= ny. Thus if in any cell of 
a contingency table, or in any sub-class of a frequency whatsoever, we have a 
frequency ny small as compared to the population V, then in sampling, this small 
frequency will have a distribution approximating to the Poisson Law, and tending 
as my, becomes larger to approach the Gaussian distribution*. It would appear, 


* Such approach is usually asswned when we speak of 


67449 Alf nae ( e “) 


as the probable error of the frequency n,. But such a ‘probable error’’ has really no meaning if 7, 
be very small and the exponentiai law be applied. 
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therefore, that the Poisson Law of Small Numbers should be applied in order to 
deal with the errors of random sampling in any small frequency, and an appeal 
should not be made—as is usually the case—to Sheppard’s Tables on the assump- 
tion that the frequency is Gaussian. 


The following Table I illustrates the results obtained (a) from the Binomial, 
(b) from the Poisson-Exponential and (c) from the normal curve on the two 
hypotheses that (1) the frequency is 10 in the 1000 and (6) is 30 in the 1000. 
But here a word must be said as to which Gaussian is to be compared with the 
Binomial or the Poisson-Exponential. The usual method of fitting a Gaussian is 
to give it the same mean and standard-deviation as the material to which we are 
fitting it. For example, we should compare the Poisson exponential with a Gaussian 
at mean m and with standard-deviation ym, or the point binomial with mean ng 


TABLE IL. 


Comparison of Binomial, Poisson-Exponential and Gaussian for cell-frequency 
variations in samples for case of 10 and 30 in a total population of 1000 


Percentage Frequency 


| 
10 in 1000 30 in 1000 
; = | a a =| 
Binomial | Reese Gaussian Binomial | feo Gaussian 
0 00004. | = =-OU0005 ‘00132 19 | -00848 | ‘(00894 = ‘01100 
1 ‘00044. (00045 =|) 00327 20 ‘01287 ‘01341 01553 CO 
2 ‘00020 ‘00227 | ‘007385 Bil ‘01857 sO1OIG 9) <O2718" 9) 
3 00739 ‘00757 ‘01491 22 ‘02556 02613 =| © :02792 =| 
4 01861 | -01892 | -02736 28) ‘03362 08408 | *08544 | 
5 | -03745 03783 | -04539 | 2, | -04233 | -04260 | -04373 
6 ‘06274 ‘06306 ‘06806 25 05110 | 05112 =| «=-05198 
or 7 “08999 “09080 ‘09224 26 05927 =| ‘05898 ‘05970 
S 8 "11282 *11260 “11300 27 | ~ 06613 ‘06553 ‘06625 
Es 9 “12561 Sb abet 28 ‘O7107 07021 | ‘07104 
oo | 29 ‘07367 | ‘07263 =| ‘07360 
oH | 
8 |10| -12574 ‘12511 "12526 
E | 30 | -07375 ‘07263 ‘07367 
eS 11 114381 11374 | 11334 
12 (09516 | °09478 | :09271 | 31 07137 ‘07029 ‘07126 
13 ‘07305 ‘07291 ‘06854 | 32 “06684 06590 =| ~=-06659 
14 "05202 705208 | 04580 | 33 ‘06064 05991 =| -06013 
15 03454 | 03472 = :02°767 34 05334 05286 =| 05246 
16 °02148 ‘02170 ‘O1511 Bo. 04553 04531 04423 
wy *01256 ‘01276 ‘00746 36 ‘03775 ‘08776 =| *03602 
18 ‘00693 ‘00709 00333 =| 37 ‘03042 ‘(03061 | *02835 
19 ‘00362 ‘00373 00134 38 *02384 02417 *02156 
20 ‘00179 | = ‘00187 “00049 389 ‘01819 ‘01859 01584 
21 ‘00085 ‘00089 ‘00016 40 01351 ‘01394 ‘01125 
22 ‘00038 00040 = 00005 41 ‘00979 (01020 ‘00771 
ASS 00016 | -00018 ~~ -0001 42 ‘00691 ‘00729 “00511 
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and standard-deviation “npg. These will, however, not be identical standard 
deviations as p is not truly unity. In ordinary practice, in testing for example the 
30 in 1000 frequency, we should put the centre of our Gaussian at our 30 group, 
and use a standard deviation = V30 (1—30/1000) = /30 x ‘97 = 5°39444 to enter 
the table of the probability integral. This is, of course, the Gaussian we obtain 
by the method of least squares, but to assume that it is “the best” is to argue in 
a circle, because we then take least squares as a test of what is best*. It is 
not the Gaussian which is directly reached by proceeding either to a limit of the 
Binomial or to the Exponential, for example, by applying Stirling’s Theorem. It 
will be seen by examining Table II that the Gaussian curve develops out of the 
exponential by a mode at the point midway between the two equal terms, rather 
than by a mode at the mean, which coincides with the centre of the second of 
them. If we apply Stirling’s Theorem to the term+ 


in 


N La nr gr 
|n —r |r eve 
of the binomial NV (p+q)”" it becomes 
No g-kr- mgt 4 (p- 9-9), 


U, e 


V 2a Vinpq 
i.e. the ordinate of a Gaussian curve of Standard Deviation Vnpq and mean at 
ng—4(p—q). These give for the Poisson-Exponential the Gaussian with standard- 
deviation ,/m and mean m—- }. The above type of curve which gives frequencies 
by coordinates and not by areas has been termed by Sheppard a ‘spurious curve 
of frequency’; at the same time it is the method by which Laplace and Poisson 
first reached the normal curve, and the real point at issue is whether we shall get 
better approximations to the discontinuous frequencies of the binomials by using 
Gaussian ordinates than by using the areas of a Gaussian curve. At the same 
time it has been shewnt that if a Gaussian curve gives a series of frequencies by 
its areas, then if its standard-deviation be o?, a spurious Gaussian frequency curve 
with standard deviation given by o,?= o°? + +;h’, h being the sub-range, will closely 
give the frequencies by its ordinates. It seems probable therefore that the 
Gaussian curve with mean at ng—4(p—q) and standard deviation Vnpq— 5 
will more closely represent the binomial for cell frequency variation by its areas, 


* There is a further flaw in this treatment—the Gaussian is continuous, the Binomial and the 
Poisson-Exponential are not. If t, be the rth term of either of the latter series, we ought really to 


make 
r+1-m N = a 9 
So” [je- | ——eé Fash alee 
r—m N2ro 


a minimum by the conditions du/dm=du/dc=0. No complete solution of this problem has hitherto 
been determined. 


Die 


+ The final form for wu, may be obtained by neglecting the terms in n in the formula given by 
Pearson, Phil. Trans. Vol. 186, A, p. 347, footnote. i 
+ Biometrika, Vol. ur. p. 311, 
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than if we apply the ordinary process of mean ng, standard deviation Vnpq, and 
Sheppard’s table for areas to the frequencies. It will be noted that this amounts 
to using Sheppard’s correction on the crude second-moment and slightly shifting 
the central ordinate towards the side of greater frequency. This is the Gaussian 
curve used in Table I. 


The object of the present section of our work is to indicate how far it is 
legitimate to use the Poisson-Exponential up to cell frequencies of the order 30 
in a population of about 1000* and how far we then reach a state of affairs, which 
for practical purposes may be described by ordinary tables of the Gaussian. It 
will be seen from Table I that the Poisson-Exponential even for ny =10 and 30 is 
not extremely divergent from the Binomial. 


In Plate VII the transition of the exponential histograms of frequency towards 
the Gaussian form is indicated for cell-frequency = 1, 5, 10, 15, 20, 25 and 30; in 
the cases of 10 and 30 the corresponding Gaussian curves are drawn. 

It will be seen that with due caution the Poisson-Exponential may be reason- 
ably used up to frequencies of about 30 in the 1000, and that after that it would 
be fairly satisfactory to use the areas of the Gaussian curve as provided in the usual 
tables. 


(6) In order to table the results of the Poisson-Exponential for easy use, it 
seemed desirable to turn them into percentages of excess and defect. For example 
take the distribution for a frequency 5. It is: 


Per cent. of Cases in which: 


0 006,737,945 a defect of 5 occurs : 0674 
1 033,689,725 3 4, oy more Pe: 4043 
Dy, "084,224,310 > 3 or more vor: 12°465 , 
3 140,373,850 P 2 or more an 26°503 
4, 175,467,310 as 1 or more ee 44-049 
5 175,467,310 the true value ar 17°547 
6 146,222,755 an excess of 1 or more ~. S 38°404 
a 104,444,825 ms 2 or more . 23°782 
8 065,278,015 - 3 or more ‘ 13337 
9 036,265,564 * 4 or more s 6809 
10 (018,132,782 P 5 or more *. 3183 
Tt ‘008,242,178 _ 6 or more ' 1370 
12 003,434,238 " 7 or more a 0°545 
13 001,320,860 _ 8 or more . 0202 


* Of course in the Poisson-Exponential itself the total frequency plays no part; it is only useful in 


testing the validity of the approximation. 


6—2 


44 On the Poisson Law of Small Numbers 


Thus we see that if the true value of the frequency be 5 for the average sample, 
it will only lie outside the tange 1 to 10 in 674 + 1:370 = 2044 cases per cent., or 
the odds are 49 to 1 that the value found will be from 1 to 10. 


On the other hand it will lie outside the range 2 to 8 in 4043+ 6°809 =10°852 °/, 
of cases, or once in about 9 trials the frequency will lie outside this range. Or, 
again, once in about every four trials (25°8°/,) the result will fall outside the 
range 3 to 7. 


On the other hand if we write «= 5 (1 — 005) = 223047, we have — 4°5 
and +55 as the deviations from a mean 5 of all beyond 0°5 and above 105, 
giving w/o =—2:0175 and + 24658 respectively. These cut off tail areas of 
02181 and ‘00684, respectively. Thus in 2°865—not 2°044—per cent. of cases 
we should assert that the frequency would he outside the range 1 to 10, or the 
odds that it would lie inside this range are now only about 34 to 1, not 49 to 1. 
Calculated from the Gaussian the frequencies outside ranges 2 to 8 and 3 to 7 
correspond to 10°1°/, and 26:2°/, of the trials instead of 10°9°/, and 25°8°/,. If 
we take for the standard-deviation of our Gaussian Vnpgq — yy = 2°21171, we find 
that the odds in the first case are still only 35 to 1, but the percentages in the 
other two cases are 11°3 and 25'8. 


It will be clear that near the centre of the curve—especially when we equalise 
the excess and defect of the Gaussian by taking equal ranges on both sides—it 
does not give bad percentages of frequency, but that it does not lend itself to 
the accurate determination of the range for reasonable working odds such as 
50 to 1. 


It will be noted that the total area in excess and defect of 2 and more 
= 23°782 + 26°503 = 50285, or corresponds very nearly to the “probable error.” 
Actually the Gaussians with standard deviations of 2°23047 and 2:21171 give 
probable errors of 1:504 and 1-492 respectively, so that the Gaussian with 1°5 as 
the probable error is very nearly accurate. 


Table II gives the Poisson-Exponential; it will enable the reader to appreciate 
the range of probable variation in small frequencies. Thus we realise that in 
37°/, of cases in which the true frequency is 1, the cell will be found empty ; 
in 13°5 per cent. of cases it will be empty when the actual frequency is 2, and in 
5 °/, of cases when the frequency is 8 and in 1°8 °/, when the frequency is 4. These 
results indicate how rash it is to assume that a sample 4-fold table with one zero 
quadrant signifies perfect dependence or association in the attributes of the 
material sampled. The second line below gives the percentages of cases that 0 
would appear in a cell when the actual number to be expected is that in the first 
line calculated from Table IT on the usual theory of a priort probabilities : 


Actual ih 1 4 5 6 | ren aes | 9 &over 


116 | 0-43 0-16 | 0-06 0°02 


| Percentage 


63°21 eee || 8°55 | 3°15 


0°01 
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TABLE II. 
Table of Poisson-Exponential for Cell Frequencies 1 to 30. 


Cell Frequencies 


Per cent occurrence of values differing by x or 


Per cent. occurrence of values differing by « or more in excess 


Fy 1 2 8 J 5 6 ¢3 8 9 10 
22 
al 
20 
19 
18 
a 17 
elen 16 
lp ae 
= | 
Soi 
sia! 12 | 
oo ii | 
‘6 10 005 
i 9 ae 012 ‘050 
ce 8 034 123 yi | 
5 if 091 302 623 | 1:033 
S 6 248 7730) |) e tea7d |) 222123) |e 25925 
5 ——=——|  -674 | 1°735 -| 2:964 | 4:238 | 5-496 | 6:708 | 
4 1°832 | 4:043 | 6-197 | 8-177 | 9-963 | 11°569 | 13-014 
3B ———_| 4:979 | 9:158 | 12°465 | 15:120 | 17-299 | 19-124 | 20-678 | 22-022 | 
2 |__| 13-534 | 19:915 | 23-810 | 26-503 | 28°506 | 30-071 | 31°337 | 32-390 | 33-282 | 
1 | 36-788 | 40-601 | 42°319 | 43°347 | 44-049 | 44:568 | 44°971 | 45-296 | 45°565 | 45°793 
Actual] 36°788 | 27-067 | 22°404 | 19°537 | 17-547 | 16-062 | 14°900 | 13-959 | 13°176 | 12°511 
1 | 26:424 | 32°332 | 35-277 | 37:116 | 38-404 | 39°370 | 40-129 | 40-745 | 41-259 | 41°696 
2 8030 | 14-288 | 18°474 | 21-487 | 23-782 | 25-602 | 27°091 | 28:°338 | 29-401 | 30°323 
B 1°899 | 5-265 | 8:392 | 11-067 | 13°337 | 15-276 | 16-950 | 18°411 | 19°699 | 20°845 
4 366 | 1°656 | 3°351 | 5113 | 6:809 | 8-392 | 9852 | 11°192 | 12-422 | 13-554 
5 ‘059 453 | 1191 2°136 | 3:183 | 4:262 | 5:335 | 6°380 | 7°385 | 8°346 
6 ‘008 ‘110 “380 813 | 1°370 | 2:009 | 2°700 | 3°418 | 4:146 | 4:875 
i 001 024 ‘110 284 +545 883 | 1:281 1726 | 2°403 | 2°705 
8 000, -005 029 092 | 202 363 “572 ‘823 | 1:110 | 1°428 
9 =e ‘001 ‘007 027 ‘070 140 ‘Q4] 372 | 532 ‘719 
10 = 000 002 008 | 023 ‘051 096 159 | 242 346 
11 is = “O00 “002 ‘007 ‘O18 ‘036 ‘065 “105 “160 
a | 12 as = : ‘001 002 006 013 025 | 044 | ‘071 
E 13 — ; = _ “000,001 002 ‘005 “009 ‘O17 “030 
Se eR hs = i000 ‘001 002 | 003.) 007 ‘O13 
15 == weet) t= See S25 ‘000 ‘001 ‘001 | 002 ‘006 
EB | 16 = ae a = 000 ‘000 ‘001 002 
eel 17% = = — a = = - “000 001 
18 _ | = = ‘001 
19 = — — —_— | — — -—— a — “O00 
20 = a = 224 (ies = = = = — 
21 = = aa == hie = = a a —_ 
22 = = -_ = = = = = = = 
23 = = _ = = _ = == = — 
2 imate ee os — = os ee ee le = _ 
Ne = —_ = ao Se dig ee See = 
26 — = — — = a= = = = oe 
at | — —— == = = 
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TABLE I1—(continued). 
Cell Frequencies 
av 11 12 | 18 Uy 15 16 iif 18 19 20 
| | 
Ly ed 22 | 
hes 21 
= 20 a 
a 19 sik eS 
=p 18 — 
a | 17 = — ‘000 “000 
es | 16 | | — | »:000'| 000) | COM mmoue 
ea | 15 | | 000 000 | ‘001 002 | 004 | ‘007 
Be | Ly ‘000 | -001 002 | 004 | ‘008 | ‘015 | -026 
eB ole ite ‘000 | 001 004 | 009 | 018 | 032] -052 | -078 
eed He 001 003 | 009 | 021 040 | -067 | ‘104 | -151 209 
8 || any 002 008 | 022 | -047.| -086 | +138 | -206 | ~-289"\") -3Sia/Nae500 
es | 10 020 052 105 }_ 181 279 | 401 543 | -706 | 886 | 1-081 
a | 129 121 229 | 374 | 553 | -763 | 1:000 | 1:60 | 17538 | 1-832 | 2-139 
peril 8 ‘492 | -760 | 1-073 | 1:423 | 1:800 | 2-199 | 2-612 | 3-037 | 3-467 | 3-901 
-£] 7 | 1510 | 2-034 | 2589 | 3-162 | 3-745 | 4-330] 4-912 | 5-489 | 6-056 | 6-613 
ae 5 | 3-752 | 4582 | 5-403 | 6-206 | 6:985 | 7:740 | 8-467 | 9°167 | 9-840 | 10-486 
is 5 | 7861 | 8-950 | 9-976 | 10-940 | 11-846 | 12-699 | 13°502 | 14-260 | 14-975 | 15-651 
8 4 | 14319 | 15-503 | 16°581 | 17°568 | 18-475 | 19-312 | 20-087 | 20-808 | 21-479 | 22-107 
2 3 | 23-198 | 24-239 | 25-168 | 26-004 | 26-761 | 27-451 | 28-084 | 28°665 | 29-203 | 29-703 
2 2 | 34-051 | 34-723 | 35°317 | 35-846 | 36-322 | 36-753 | 37-146 | 37-505 | 37-836 | 38-142 
1 | 45-989 | 46-150 | 46°31] | 46-445 | 46°565 | 46-674 | 46°774 | 46-865 | 46-948 | 47-026 
| | 
|Actual, 11-938 | 11-437 | 10-994 | 10°599 | 10-244 | 9-922 | 9-629 | 9°360 | 9-112 | 8:884 
“A 1 | 42-073 | 42-404 | 42-695 | 42-956 | 43-191 | 43-404 | 43-597 | 43-776 | 43-939 | 44-091 
8 2 | 31:130 | 31-846 | 32:486 | 33-064 | 33-588 | 34-066 | 34°503 | 34-909 | 35-283 | 35-630 
z 3 | 21-871 | 22-798 | 23°639 | 24-408 | 25-114 | 25-765 | 26-367 | 26-928 | 27-451 | 27-939 
= 4 | 14596 15-559 | 16-450 | 17-280 | 18-053 | 18-776 | 19-451 | 20-088 | 20-686 | 21-251 
2 5 | 9-261 | 10°129 | 10-953 | 11°736 | 12°478 | 13-184 | 13-852 | 14-491 | 15-099 | 15-677 
= 6 | 5593 | 6-297 | 6-983 | 7°650 | 8-297 | 8-923 | 9-526 | 10-111 | 10-675 | 11-219 
s | v7 | 3219 | 3742 | 4266 | 4-791 | 5-311 | 5-825 | 6-399 | 6-826 | 7:313 | 7-789 
|= | 8 | 1769 | 2-198 | 2-501 | 2°884 | 3-275 | 3-669 | 4-064 | 4-461 | 4-856 | 5-248 
© 929 | 1160 | 1:407 | 1-671 ; 1°947 | 9-932 | 9-593 | 9:824 || 3107 \\iardae 
| > 10 467 | -607 762 | . 933° | 1117 | 1-312 | 1-516 | 1732" |S kobaeeoaiep 
foes ana) Ble 225 305 | 396 | 502 | 619 | -746 | -882'| 1-030 | 1:185 | 1:348 
aa | 12 104 | +148 | = -201 261 331 ‘All ‘497 | +595 | -699 | - -809 
58 | 13 047 | 069 | 097 | -131 172 | -219 | -272 | 333 | 400 | “478 
Seca, 020} ‘031 046 | -063-| 086 |" 1140) “14d | 1625) eos mmmoes 
ea | Is 008 ‘O14 ‘O21 ‘030 042 ‘057 074 ‘096 121 149 
295 | 16 003 | -006 | 009 | -013 | 020 | 028 | 036 | 050 | 064 | -081 
cali | ll 001 002 | 004 | 006 | ‘009 | 014] 017 025 | 033 | 042 
i 18 001 000 | 002 | -002 | 004 | 006 | 008 | ‘O12 | “O17, |) =e-0z2 
2 19 000 | 000-001 001 002 | 003 | 003 | -006 | 008] O11 
8 20 ae ae 001 ‘000 | -001 ‘002 | 002 | 003 | -004 | -005 
5 Of |, was = 000 | 000 | -000 | 001 001 | 002 | -002 | -003 
E Bea = 000 | 000-001 001 | ‘001 
So eee dine = = = ze = an = 000 | 000 | 001 
° 24 = = = = = = = = = 000 
= 25 at em | ees = = = = = = == 
8 26 = = ke i oi 
R a7 _ — — — — — — 
aw 28 Es at, ae ee Sn (ure — = = = 
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TABLE Il—(continued). 


Cell Frequencies 


4 


= 


v 21 22 23 24 25 26 or 28 29 30 
Al , 
° Op | ae es ee = = = = = = = ‘000 
5 21 = ral -_ oe = =e ‘000 000 ‘000 ‘001 
8 20 —_ 2 — — 000 000 ‘O01 001 ‘001 “002 
| 19 ae aa -000 -000 ‘001 ‘001 002 | = -008 004 006 
2p 18 -000 000 001 001 002 004 006-009 012 ‘O17 
|-Ba | 17 ‘001 002 003 ‘005 008 ‘O11 016 | 023 ‘O31 ‘O41 
some 16 003 ‘006 ‘010 O15 022 ‘031 043 | 056 073 092 
| alee eee 012 020 -030 043 O59 ‘078 “102 “129 160 195 
An Mew 039 058 ‘081 “109 142 “180 224 273 328 387 | 
ES 1 ‘ll “150 ‘198 252 314 384 “460 543 632 727 | 
Se | 19 277 355 “443 540 ‘647 762 884 | 1:014 1-151 1-293 | 
pon lett “625 763 912 | 1:072 | 1°240 | 1-417 | 1-601 | 1°791 1987 | 2-187 
een | 10 1-290 | 1°512 | 1°743 | 1:983 | 2:229 | 2-482 | 2-739 | 3:000 | 3-263 | 3:528 
ze 9 2-455 | 2°778 | 3°107 | 3:440 | 3°775 | 4111 | 4:446 | 4°781 | 5-114 | 5-444 
ee 8 4-336 | 4°769 | 5-200 | 5-626 | 6:048 | 6-463 | 6°872 | 7-274 | 7-669 | 8:057 
zg Hi 7°157 | 7°689 | 8-208 | 8-713 | 9-204 | 9-682 | 10-147 | 10-599 | 11-038 | 11°465 
38 6 | 11°107 | 11:704 | 12-277 | 12:827 | 13°358 | 13°867 | 14°357 | 14°830 | 15-285 | 15°724 
aa 5 | 16292 | 16-900 | 17-477 | 18-025 | 18°549 19-048 | 19°525 | 19°981 | 20-417 | 20-836 
= 4 | 22°696 | 23-250 | 23°771 | 24:263 | 24°730 | 25-172 | 25°591 | 25-990 | 26-371 | 26-734 
© 3 | 30-168 | 30:603 | 31:010 | 31°391 | 31°753 | 32-094 | 32-416 | 32-721 | 33-011 | 33-287 
2 2 | 38-426 | 38-691 | 38:938 | 39°168 | 39-387 | 39°593 | 39°786 | 39:970 | 40143 | 40-308 
1 | 47-097 | 47:164 | 47-226 | 47-283 | 47-340 | 47-392 | 47-440 | 47-486 | 47-530 | 47-572 
Actual] 8°671 | 8:473 | 8-288 | 8-115 | 7:°952 | 7:799 | 7°654 | 7-517 7°387 | 7:264 
2 1 | 44-232 | 44:363 | 44-485 | 44-603 | 44°708 | 44°810 | 44:906 | 44:997 -| 45-083 | 45°165 
8 2 | 35-955 | 36-258 | 36-542 | 36-812 | 37-062 | 37-299 | 37-525 | 37-739 | 37-942 | 38-135 
2 3 | 28-397 | 28:828 | 29-235 | 29-620 | 29-982 | 30:326 | 30°653 | 30-965 | 31-262 | 31-546 
5 4 | 21-785 | 22-290 | 22-770 | 23-227 | 23-660 | 24-074 | 24:469 | 24°847 | 25-208 | 25-555 
a 5 | 16-230 | 16°758 | 17-264 | 17-748 | 18-211 | 18-655 | 19-083 | 19-493 | 19-888 | 20-269 
# 6 | 11-744 | 12-251 | 12-740 | 13-213 | 13-669 | 14:110 | 14°538 | 14°951 | 15-351 | 15°738 
a ” 8254 | 8709 | 9°153 | 9:585 | 10-007 | 10-418 | 10°819 | 11°210 | 11-591 11-962 | 
B 8 5-637 | 6:022 | 6-402 | 6-777 | 7:146 | 7:509 | 7:866 | 8-218 | 8°562 | 8-901 
& 9 3°742 | 4:052 | 4:362 | 4:670 | 4:978 | 5-284 | 5°588 | 5-890 | 6-188 | 6-484 | 
ES 10 2-415 | 2-654 | 2°895 | 3:188 | 3°385 | 3-632 | 3°880 | 4:129 | 4:377. 4°625 
a 11 1517 | 1:692 | 1°873 | 2-057 | 2-246 | 2-438 | 2-633 | 2°831 | 3-030 | 3-230 | 
tai | 812 927 | 1°051 | 1°18] 1315 | 1°456 | 1:599 | 1:°747 | 1:899 | 2-053 | 2-210 
isso 13 “552 637 727 “821 92] 1025 | 1°134 | 1:247 | 1°362 | 1:481 
HS 14 320 376 437 “500 570 643 720 ‘801 885 | 973 
4 15 ‘181 217 256 298 “345 394 ‘448 “504 564 626 
2 | 16 ‘100 -122 147 ‘173 204. 237 272 311 352 395 
‘ee | 17 054 -067 082 ‘098 ‘118 139 162 “188 ‘215 | +245 
ig 18 -028 036 ‘045 ‘055 ‘067 ‘080 095 ‘111 129 | +149 
° 19 O15 ‘019 024 -030 ‘037 ‘045 054. ‘065 076 | 089 
S) 20 007 010 013 016 ‘020 ‘025 ‘03 ‘037 044 052 
B 21 004 -005 ‘007 009 ‘Oll | 014 ‘O17 021 025 ‘030 
= 22 002 ‘002 004 ‘005 006-007 009 ‘O11 O14 ‘O17 
8 23 ‘001 ‘001 002 ‘003 003-004 ‘005 006 ‘007 ‘010 
S Qh 001 001 ‘001 002 002 002 003 003 004 006 
= 25 ‘000 000 001 ‘001 ‘001 001 ‘001 002 002 003 
8 26 =s ee -000 ‘000 000 ‘001 001 ‘001 ‘O01 002 — 
x or es = : ‘000 -000 000 -000 ‘001 
a 28 Bs = = -_ _ : | 


“O00 
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PART II. CRITICISMS OF PREVIOUS APPLICATIONS OF 
POISSON’S LAW OF SMALL NUMBERS. 


(7) We now turn to the illustrations which various authors have given of 
the Law of Small Numbers. 


“Student's” Cases. We take first the series given by “Student” in his memoir 
on counting with a Haemacytometer*. They are of special importance because 
the series at first appear of fairly adequate size, namely consisting of 400 
individuals, and further we should anticipate that the Law of Small Numbers 
would hold in his cases. He obtains better fits with the binomial than with the 
exponential but, as he remarks, he has one more constant at his disposal. On the 
other hand, if the exponential be a true approximation, the binomial ought to come 
out with a large n and a small but positive g. “Student” finds for his four 
series : 

L400 x (111893 — 1893)-2™, 
Il. 400 x (97051 + 02949 624, 
III. 400 x (1:0889 — -0889)-2"™, 
IV. 400 x (9525 + 0475989", 

{I. and IV. may, perhaps, be held fairly to satisfy the conditions, although it 
is not certain if 46 is to be considered a large n or ‘05 a very small q. 


I. and III. fail to satisfy the conditions at all, unless the probable errors of q¢ 
and n are such that g might really be a small positive quantity and n really large 
and positive. The following are the values for the four series of n and q and their 
probable errors : 

I g=— 1893 +0647, n=— 3°6054+4 1:2209. 
Il. qg=+°0295 +0457, n= 46°2084 + 71°7373. 
III. g=— 0889 + 0534, nm = — 202473 + 12°1165. 
IV. qg=+ 0475 +0452, n= 985263 + 93°7494. 


Now while these results are very satisfactory for II. and IV., they are not 
wholly conclusive for I. and III. We can approach the matter from another 
standpoint; the probable error of g for p=1 is 


il . 
67449 Ta V2 = 67449 x 0707 


in “Student’s” cases. Thus the deviation of q from q a very small quantity is for 
I. 2°68 times the 8. D., and for III. 1:26 times the 8. D. Since g may be either 
positive or negative, we may reasonably apply the probability tables and the odds 
against deviations occurring as great as these are in one trial about 250 to 1 and 
9 to 1 respectively. Hence in four trials we should still have large odds against 
their combined appearance. 

* Biometrika, Vol. v. p. 356. 
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We have said that the results for Il. and IV. are fairly satisfactory, ie. we 
mean that they are consistent with g being small and positive and n being large ; 
but of course they are also consistent with g being negative and n being small and 
negative. 

It will be obvious from these results for “Student’s” data that it is extremely 
difficult to test the legitimacy of the bypothesis on which the “Law of Small 
Numbers” is based. In none of the cases dealt with by Bortkewitsch, much less 
in those dealt with by Mortara, are the populations (V) anything like as extensive 
as those considered by “Student.” But populations of even 400 give, as we see, too 
large values of the probable errors of g and w for us to be certain of our conclusions. 


(8) Bortkewitsch’s Cases. Taking Bortkewitsch next, he deals with the 
following cases : 

I. Suicides of Children in Prussia for 25 years: (a) Boys, (b) Girls, 25 cases. 

II. Suicides of Women in eight German States for 14 years: 112 cases or 
8 subseries of 14. 

III. Accidental Deaths in 11 Trade Societics in 9 years: 99 cases, or 11 sub- 
series of 9, 


IV. Deaths from the Kick of a Horse in 14 Prussian Army Corps for 20 years: 
280, or, as Bortkewitsch, 200 cases. 


It will be noted at once that Bortkewitsch’s populations (1) are far too small 
for any effective determination of the legitimacy of his application of Poisson’s 
formula to his data. 


We take his cases in order: 
I. (a) Suicides of Boys. 


TABLE IIL. 
| Number of Suicides _... ON LZ AS We 5 6" | 7andiover 
| | | 
| -——-—-— cae at Se } 
| Number of Years So 4 | 8 | Des) 4s 108 1 | 0 


The binomial is: 
25 [1:2033 — -2033]-°™, 
Mean 1:9600 and yw, = 3:2584. 
We have g= — 2033 +°2421, n= — 96425 + 109416. 


If y were really zero its probable error would be +1908. Clearly 25 cases are 
wholly inadequate to test the legitimacy of applying the Poisson-Exponential to the 
frequency*. But to what extent is the reader made conscious by Bortkewitsch 
that his cases fail entirely to demonstrate the legitimacy of applying his hypotheses ? 

* The x2 for the binomial is 2°379 and for the exponential 2°836, showing a somewhat better 
result for the binomial. 

Biometrika x 7 
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I. (6) Suicides of Girls. 


TABLE IV. E 
Rees ae | | 
Number of Suicides... 0 vi | 2 3 | 
| — ——— 
Number of Years a 15) | oan | 0) 


The binomial is: : 
25 ['7418 + °2582 7, 


Mean = 4400 and yu, = 3264. 
We find g ='2582 +1012, n=1°7041 +°7850. 
As in the case of the boys’ suicides, if g were practically zero its probable error 


would be + ‘1908, and there is nothing in this result again to justify us in asserting 
that q is indefinitely small and n indefinitely large. 


Actually we have: 


TABLE V. 
Number of Suicides per Year. 
0 1 aye mans 
x 7 | 

Actual ... ee 15 9 1 ure 
Bortkewitsch 16-1 71 [cg | ees 
Binomial (a)... 15:0 8°9 11 — 
Binomial (0) 15:2 8°7 11 — 


(a) is the binomial considered above, (b) is the binomial obtained by taking 
n a whole number = 2, and g= mean/2 = ‘22, we. 25 (78 + °22). 

It is clear that either (a) or (b) gives better results than the Poisson-Expo- 
nential. Applying the test of goodness to fit, we have 

x? = ‘007 for the binomial (qa), 
x? = 610 for Bortkewitsch’s solution. 
Both give P > ‘60 but the first is much better than the second. 
If both boys and girls are taken together, we find the binomial 
25 (9333 + (0667). 

This is the nearest approach to a small q and big 7 we have so far found—ze. the 
nearest approach so far to an exponential, but it is reached by a process, «.e. that of 
adding together two series of entirely different means and variabilities in a manner 
which cannot be justified, for Bortkewitsch’s hypothesis depends essentially on the 


homogeneity of his material. Even here the fit of the point binomial is slightly ; 
better than that of the exponential. 
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II. Suicides of Women in hight German States. Bortkewitsch gives the 
following table : 


TABLE VI. 
Number of Suicides of Women per Year 
State - - Totals 
Ore iee ee eel Sachse he Sl o.8| Fo 
| | 

| (a) Schaumbureg- tnppe: 4) 4) 2) 4)/—]--|]— | 14 
(b) Waldeck... Mee huer dl 8 A he | 14 
(ce) Liitbeck ee ves elle i geale cele a eee os 14 
(d) Reuss a. L. . ce soni | oil | Sl soulmate 2a Ue Pe ms 14 
_(e) Lippe PR Gy OU) BE ay a | | 14 
(f) Schwarzburg- Rudolstadt ... | — ; 1}—}; 2)/—) 56]; 8) 2) 1)—)— 14 
(g) Mecklenbure- Strelitz soe |e MBL) PE ee ea eT So eee eae 14 
| (A) Schwarzburg-Sonderhausen SN a ed Sc he 14 
Totals 112 


The resulting binomials are : 

(a) 14( 9714 + 0286)", 
(b) 14( ‘8571 +°1429)9%, 
(c) 14( 5819 +4181), 
(d) 14 (1:0058 — -0058)-##24, 
(e) 14(1°3929 — -3929)-77, 
(f) 14( 6071 + :3929)3%, 
(g) 14(1°5792 — 5792)-91"7, 
(h) 14 (16609 — -6609)-3, 


Thus it will be seen that of the eight binomials only four have a positive q, 
and of these only one can be said to have a very smajJl g, and even in this case the 
n is not indefinitely large. Of the four negative binomials three have quite 
substantial q’s, and the fourth with its small negative qg corresponds most closely 
to the Poisson-Exponential. The probable error of g for g=0 is +:2549. The 
number, 14, of cases taken is therefore wholly inadequate to test whether the 
Poisson-Exponential may be applied to these data. The mean value of q is 
negative and = — ‘0820 + ‘0901, and the standard deviation of g=:3928 + -0637, 
which are within the limits of random sampling of g =0 with a standard deviation 
of 3779. We shall return to a different manner of considering the point later. 
At present we wish only to indicate that the hypothesis is that q is a very small 
positive quantity and that data which give ga standard deviation of ‘3928, or in 
the next example of 4714 are really inadequate to test such a hypothesis ; for in 
the resulting binomials g may easily lie anywhere between +°8 and —°8, and it 
is not possible to demonstrate that its real value is practically an exceeding small 
positive quantity. 

7—2 
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III. Accidental Deaths in 11 Trade Societies. Bortkewitsch provides data 
from which the following table is deduced: 


TABLE VII. 


Accidental Deaths 


Index Number 


of Society Totals 


1 ] | ; 

0 BNO NG We | eae | 10 al 
13 S Syl 9 
1h ; 2 3 9 
12 fq 3 9 
20 =a a 9 
23 ei 9 
QF AG pas 9 
29 Ne ie 9 
Al Te hee 9 
40 1 2 9 
42 }— | — 9 
55 — 2 9 

Totals ... | 16 | 7 


The resulting binomials are: 
(18) 9( 4914+ -5086)5°8, 
(14) 9( 61844 -3816)"%, 
(12) 9:(1:9227 = 9227), 27, 
(20) 9 (11282 — +1282)-s2"e00, 
(23) “9° 9921 2.0079) eenes 
(27) 9( 52294-4771), 
(29) 9 (14130 — -4130)72™, 
(41) 9( 8454 + +1546)9°66, 
(40)° 9 (2:0342 — 1:0842)-27™4, 
(42) 9( 9822+ -0678)72”, 
(55) 9( 6154+ °3846)n2, 

Of these eleven binomials seven have a positive g; only one of these (23) 
actually corresponds to a really small q and large n, although a second, (42), 
approximates to this condition. In the five other cases the q’s are quite sub- 
stantial; in (13) the q is larger than p. Of the four negative q’s none can be said 
to be so small and the » so large as to suggest that they really correspond to the 
Poisson-Exponential. The probable error of q for q=0 is, however, + ‘3180, and 
thus for such small series, no test whatever can be really reached of the legitimacy of 
applying the Poisson-Exponential to such data. We may note, indeed, that seven 
of the eleven values of g exceed the probable error and two of these are more than 


three times the probable error. We should only expect two negative values of ¢ 
as great or greater than ‘9227 in 80 trials, whereas two have occurred in 9 trials, 
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so that the odds are considerably against such an experience. 
g is — 0469 +0959 and the standard deviation of g is ‘5127 + ‘0678, both results 
compatible with g indefinitely small and a standard deviation = “4714. The main 
problem, however, of the legitimacy of applying the Poisson-Exponential to such 
series cannot be answered by data involving only total frequencies of 9 to 14 
cases in the individual series. 


He clubs the 
results given for each application of the Poisson-Exponential together and 
examines the observed totals against the sums of the calculated totals. Thus 
calculating the 11 Poisson-Exponential series* and adding them together 
Bortkewitsch finds for observed and calculated deaths: 


TABLE VIII. 
Accidental Deaths in 11 Trade-Societies. 


Bortkewitsch examines the matter from another standpoint. 


The mean value of 


Number of Deaths 


Observed Frequencies 
Sums of 11 Exponentials 


Single Binomial .. | 3°8 | 95 


0 


10 | 11 | 22 


oe O) Te 13 4: 16 7 
3-7 15°2 | 14-3 | 12°3 | 9°8 


| 


20|12)o7| 0-7 


13°9 | 15°6 


14-8 | 124 9°6 


138 & over} Totals 


If we attempt to fit a single binomial to the observed line of totals, we obtain: 
m= 43636, o2=7'°5849 
leading to the negative binomial : 
O97 382 — 7802). 7s, 
g=— 1382, + 18297,- n=— 59111 + 1391, 


or the constants are significantly substantial with regard to their probable errors. 
The resulting frequencies are given in the last line of the table above. The reader 


Here: 


* The values of the means and standard deviations for the eleven societies are : 


m | o m o m o 
13 7889 1:969 23 6°222 2-485 || 40 2°889 2°424 
14 2-556 1:343 27 1'889 0-994 || 42 | 4-556 2-061 
12 2556 | 2217 29 5889 2-885 || 55 4°333 1°633 
20 4:333 | 2-211 41 5111 2079 || | 


All these means are less than 10, which is the limit reached by Bortkewitsch’s Tables for the Poisson- 
Exponential. Bortkewitsch says he has taken the societies for which ‘the statistics indicated the 
smallest numbers of such accidents.’’ This is not very clear. It is certain that a society with a mean 
number of accidents =100, if it consisted of 200,000 members, would be more suitable for application 
of the exponential, than one with a mean of 8 if it only contained 10,000 members. Both Bortkewitsch 
and Mortara confine their results to means less than 10, and seem to indicate that ‘‘ smallness” has 
been determined by the absolute frequencies, but clearly it is relative frequency with which we have to 
deal. The use of such a term as Das Gesetz der kleinen Zahlen for the Poisson-Exponential seems open 
to serious objection, if it be associated with ‘‘m” an absolutely small number, and not with smallness 
of ‘¢q.” 

+ For q=0, the probable error would be +°0959 and accordingly q is very divergent from the 
Poisson-Exponential value of zero. 
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will be surprised to see how closely the single negative binomial determined by 
two constants gives the same result as the sum of the eleven Poisson-Exponentials 
determined by eleven constants, no one of which is really of any significance for its 
own exponential*. If we apply the condition for “goodness of fit,” »?= 5°83 for 
the single binomial and y?= 5°88 for the sum of the eleven Poisson exponentials, 
leading to P='950 and P= ‘951 respectively, or the fit with a-single negative 
binomial is slightly better than that with eleven exponentials. The two constants 
are significant, the eleven constants have no real significance for their individual 
series, as is demonstrated by the fact that the binomials for these series do not 
approximate to the Poisson-Exponential type. 


We may now consider the previous case of suicides of women from the same 
standpointt. The following are the data as given by Bortkewitsch : 


TABLE IX. 
Suicides of Women in Hight German States. 


| Number of Suicides 0 | 1 | 2 | Beall oA | 8 | 9 | 10 & over Totals 


Or 
fo) 
Se 


| Observed Frequencies 9? S198 O05 115 Uae ee eee ate 3 112 

Sum of 8 Exponentials | 8°0 | 16°9 | 20°3) 18°7 | 15-1 | 11°4 | 8°3 | 5°6 | 3°6 | 21 | 2°0 112 

| | 

| 

Single Binomial ... | 12°6 | 18:4 | 18°8 | 16°4 | 13-2 | 9°9 | 7:2 | 5-1 | 35 | 24 | 4°5 112 
i | 


For the single binomial we have : 
m = 3'°4732, o7? =8:2312, 
leading to: 112 (2:3699 — 1°3699)- 25354, 
where q=— 13699 +°1490, n= — 25354 + 8076. 


If q were very small its probable error would be +0901. The values of g and n 
are quite significant, g is large and negative and n is small and negative. The 
resulting frequencies are given in the last line of the table as “Single Binomial.” 
Turning now to the test of “goodness of fit,” we have for the sum of the 8 ex- 
ponentials y?= 7:957, and for the single binomial y?= 7°740, leading to P= 633 

* If the reader will turn to the first footnote on p. 53 he will note that for nine cases, the standard 
deviations of the means (o//9) are roughly about -7 or errors of +1 to +1:5 may easily occur in the 
means. Hence with the possible exception of (13) and (27) the m’s have not significant differences, and 
are not typical of the individual societies. 

+ The values of the means and standard deviations are: : 


| m o | m o | 
Schaumburg-Lippe | 1:°429 1-178 Lippe Are ag on 2°857 1-995 
Waldeck eligi) poco 1:378 | Schwarzburg-Rudolstadt ... 5143 1:767 
Liibeck ... ais 2-571 1:223 | Mecklenburg-Strelitz or 5°286 2°889 
Reuss a. L. we =| 2648 1:631 Schwarzburg-Sonderhausen | 5°642 3-061 


The standard deviation of the mean is here o/V14, or, say, 5. Thus errors of 1 might easily occur 
in the values of m. There are probably significant differences between the first five and the last three 
states, but not between the first five among themselves or the last three among themselves. Thus the 
Poisson-Exponentials, if correct in theory, are not significant for the individual states, 
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and ‘654 respectively. Thus again the single binomial with only two constants 
give a fit slightly better, than the sum of eight exponentials with eight constants. 

Bortkewitsch looking at the observed frequencies and the sum of 8 or 11 
exponentials—without using any satisfactory test for “goodness of fit ”—assumes 
that the coincidence is so good as to justify his hypothesis. But a better fit can 
be obtained with two instead of 8 or 11 constants by simply using a negative 
binomial. We must note here that Bortkewitsch is using the final coincidence 
merely as justification of the Poisson-Exponential; the total frequency is not 
describable in terms of the 8 or 11 constants as it is in terms of the two, for 
these eight constants are not really significant for his individual eleven trade 
societies or for the suicides in the individual eight states. If he wants to describe 
the total, he has no constants by which he can do it. If, on the other hand, he 
wishes to describe what has occurred in the individual societies or states, we have 
seen that their binomials differ very widely from Poisson-Exponentials. If, lastly, 
no stress be laid on the individual cases as having too large probable errors, but 
only on the general coincidence with total frequencies, then the same coincidence 
would justify us in using a single binomial with two constants only*. It appears 
to us that to properly test the Poisson-Exponential, we need not 9 or 14 instances 
in the individual case, but several hundred instances,—more, indeed, than “Student” 
has taken—and that no proof of the “Law of Small Numbers” can be obtained 
on data such as those of Bortkewitsch or Mortara. 


IV. Deaths from the Kick of a Horse in Prussian Army Corps, omitting four 
Corps with Bortkewitsch. 


Here the results are: 
TABLE X., 


| Number of Deaths ... 0 1 Z 3 | 4 Totals 


Number of Corps —... 109 65 22 3 1 | 200 


Whence : 
m='61, p.='6079 
and the binomial is: 
200 (996,557 + 003,443 771707, 
This is the first of Bortkewitsch’s illustrations for which his hypothesis that q is 
small and n large is really justified by his data. For: 
q = 0034 + 0670, 
n= 1771711 + 3449:108. 
The probable error of g for q really zero is + ‘0674. 
* Of course immensely better general total fits are obtained by using the sums of the actual 8 or 11 


binomials than by the Poisson-Exponential sum or the single binomial, but the results in that case 
involve 16 or 22 non-significant constants. 
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The actual results as given by the binomial and the Poisson-Exponential are: 


TABLE XI. 


| i 
Number of Deaths ... | 0 eae ae 2 3 4, and over | 
= we |I— | | | | 
Observed Hae as 109° =4)), Ob" 22 3 1 
Binomial a ax 108°6 | 66°4 | 20°2 Ail 0-7 
Exponential ... “3 108°7 | 66°3 20°22 | 4:1 O'7 | 


Actually if we work to two decimal places in the frequencies we have y? = ‘61 
for both binomial and exponential, or the goodness of fit is practically identical. 


In this case it seemed worth discussing the binomial fit more at length. 
Taking the moment coefficients about the mean we have: 


(1) Mean =ng="6100. 


(11) bs = npg = 6079. 
(111) Hs = npg (p — g) = 590,562. 
(iv) fy = npg (1 + 8npgq — Opq) = 1:°643,373. 


We have already discussed the binomial from (i) and (ii), giving x” for goodness 
of fit ="6096. Using (11) and (111) we have for the binomial 
200 (985,739 + 014,261), 
giving x? = 665. 
Using (111) and (iv) we have: 


200 (979,524 + °020,057 303, 


giving x? = 707. 
Putting : B.= Me | pe? and Bi = M3/ ps?, 
we have: B,—-3 =(1—6pq)/npq, Bi. = —4pq)/npy, 


and working from 8, and £, we find: 
200 (969,150 + :030,850)89™, 
and in this case x? = 1:1286. 


This of course does not give a bad fit, but it is clear that working from the 
lowest moment coefficients, as we might anticipate, gives the best results. 


But if q be the chance of death from the kick of a horse, and n the number of 
men in an army corps, then the binomial should be 


200 (p +q)” 


Now it is obvious that none of the binomials give, by their value of n any 
approach to the real number of men in an army corps. If we start with the 
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number of men 7 in an army corps as 50,000*, we have ng ="61 and g=:000,0122, 
thus reaching the binomial 
200 (-999,9878 + 0000122), 


giving as compared against Bortkewitsch : 


Binomial Bortkewitsch 
0 108°6876 108°6703 
1 66°3002 66°2889 
2 20°2213 20°2181 
3 41115 41110 
4 and over °*7035 ‘7034 
and y? = 608,298 608,318 


or, the slight advantage to the binomial exists but is of no significance. 


Now it seems to us that in this case the use of the exponential is justified for 
the total frequencies, but as far as describing those frequencies is concerned, it 
gives no better result than the binomial. But as in the other five of Bortke- 
witsch’s cases the Exponential is not justified by the individual series themselves. 


It is perfectly true that the exponential has a definite theory behind it, and 
is interpretable in terms of that theory, i.e. we must suppose the probability of an 
occurrence very small and the chance of its repetition absolutely identical. But 
is the second of these conditions ever likely to be demonstrable a priori, or must 


* This supposes that every man in the army corps is equally liable to death from the kick of 


a horse; of course a very arbitrary assumption. 
+ To illustrate the idleness of the application of the Poisson-Exponential even to these data for the 


Prussian Army Corps, we give here the binomials for the whole of the 14 corps. 


Index Number 
of Corps Binomial 


G 20 (-95 + -05)16-0000 
: 20 (1325 — +325)-2-4615 
ul 20 (1:5667 — 5667) —1-0585 
Tl 20 (-9 + +1)6-0000 
IV 20 (-6 + -4)1-0000 
W 20 (-6318 + +3682) 1-4938 
Vi 20 (1:0912 — :0912)—9-3202 
VII 20 (-9 + +1)6-0000 
VI 20 (-65 + +35) 1-000 
Ix 20 (‘8115 + *1885)3-4483 
x 20 (1°05 — -05)-15-0000 
XI 20 (1-11 — -11)-11-3036 
XIV 20 (1:05 — -05)—24-0000 
XV 20 (1:1 — *1)—4-0000 


One seeks in vain through these binomials for any approach to q very small and positive and very 
large and positive. In no case does n approach the number of men in an army corps, say 50,000, 
or q equal the chance of a death from the kick of a horse, say, ‘0000122! It seems impossible by 
clubbing such equations together to give any satisfactory proof that the Poisson-Exponential really does 
apply to individual cases. In the 20 years involved, there were doubtless great changes in both 
the training and the personnel of each army corps, and the results obtained may be just as much due to 
such causes as to the errors of small samples. 
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not we a posteriori demonstrate it from the data themselves? Child suicide may 
be influenced by example, by environmental conditions in different districts, 
possibly even by meteorological conditions in different years. Again, even in 
different army corps the conditions may be far from uniform, the spirit of the 
corps, the teaching with regard to the handling of horses, the experience of past 
life according to whether the corps is raised in town or rural districts may all tell. 
Even Bortkewitsch before he gets his best fit removes four corps or 80 observations 
from his data. We do not criticise this removal, but even unremoved he says the 
fit of theory with experience leaves “wie man sieht, nichts zu wiinschen iibrig” 
(p. 25). But the binomial is before removal: 
280 (1:085,714 — 085,714)-8 155 

in which q is not very small and is negative, and n is not very large and is not 
positive. It is true that the probable error of g for q insignificant is in this case 
+0570, but this only shows that the data were insufficient in quantity to 
determine whether the exponential could be applied or not. 


(9) Mortara’s Cases. 

Mortara* in an interesting paper has realised the possibility of repetitions not 
being independent and has discussed a constant @’, by which he proposes to test 
such influence. This quantity Q should be unity, if the Bortkewitschian hypo- 
thesis can be applied. He then takes 16 or 17 districts with records of 10 years, 
and calculates the mean number of deaths from some special cause per year, say, 
for each district for those years. If this mean number exceeds 10, he casts out 
that district, presumably on the ground either (1) that such a number is no 
longer small, or (ii) that it differentiates the district from those with lower 
numbers. Thus Bologna with 10°9 deaths by murder is excluded and Bergamo 
with 84 is included, although Q’=1 for both. Bologna with 7:1 deaths from 
smallpox is included, but Pavia with 12°3 is excluded although the Q’ of the 
former is 2°5 and that of the latter 1:7. What method should be employed in 
dealing with the frequency of the excluded districts which may amount to 50 °/, 
of all districts is not discussed. Having thus reduced his available districts, 
Mortara proceeds to apply the exponential to each individual district ; he adds up 
the results for each district and compares his totals with the observed totals. It 
will thus be observed that he fits his exponential to ten observations, and then adds 
together five or more districts to get his totals. We can equally well apply this 
process by fitting a binomial to each 10 observations and then adding up such 
results. But it is quite clear that on the basis of ten observations, it is, owing to the 
large probable errors, wholly impossible to assert, whether a binomial of the kind 
required by the Bortkewitsch-Mortara hypothesis,—i.e. one of very small positive q 
and very large positive n—really is justified. We can illustrate this at once from 
Mortara’s Tables (see his pp. 42 and 45) for deaths from Chronic Alcoholism. The 


* « Sulle variazioni di frequenza di aleuni fenomeni demografici rari,” Annali di Statistica, Serie v. 
Vol. 1v. pp. 5—81. Roma, 1912. 
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observed numbers, and those deduced from the binomials are given in the 
accompanying table. At the foot are the observed totals, Mortara’s exponential 
totals and the binomial totals. 


TABLE XII. Deaths from Chronic Alcoholism. 


Oe Cason aber) |) ling | 9 | io | ar | a2 las |e | 
| | | | 
| | | i. i st | | 
Calabria 1 ® 4 — 2 = |) = | | — | — | Observed 
1:49] 2°84) 2°70) 1°71 81]; -31}) -10] -03)| -01 = | —- | Mortara 
118} 2°85} 3:06} 1:91 "76 | °20} -03 | - | = | | Binomial 
Foggia 1 2 4 — | 2 oS || | | | O. 
1:00} 2°30] 2°65] 2:03) 1:17 54 | “21 07 | °02] -O1 | M. 
96) 2°29) 2°70 | 20S eelaliSae <b3 19-06 Ol; — | B. 
Siracusa 2 1 3 — 2 2) | - —;/;—|— | 0. 
*82| 2°05] 2°56] 2°14 34 67 | 23) “10 | -03)| “Ol — — |/—}]}—|— |M. 
112) 2:16). 2°33} 1°85] 1:21] ‘69 S4ay | kr | S07) 03 Ol B 
| 
Potenza 2 — 2 2 1 1 ea ee | = /O 
4 1°30) 2°09) 2°23) 1:78) 1:14 Gil 288 O04 ol; -- |—/;—}|]— |M 
78| 1:6 1:95} 1°80] 1°41] -98 63} °38 21 12} ‘06; °03; -01} 01] — | B 
Catanzaro | 1 1 Smee eee = 1 1 1 — | — Re pel |= Peony 6) 
15 s63= l32.|- 1-85)| 1:95.| 1:63) 1-14) 369 | +36 17 ‘OT (0953 |) COE |) |) Se 1 AE 
Solmelecon LAG (0 1363) Peli | 95 75] 57 | +43 31 23) °16/ 12) °08 7/B 
Salerno 1 1 il _— 2 — il 1 2 — | — ae Oz 
06 31 e19ee eso) 72") 75a 1491-09) -69)) <39 20 9} 04] °0 “Ol | M. 
40 86) 1:18} 1°31} 1°27)1°14] °95 lide | s09 45 33 | °24 | ‘17 22 1B 
| | 
Cosenza 2 = I = l Sy 8) 1 a el =| 1}/—|— |0O. | 
06 29 foe 29) |e e1e68i\ dere Io 112 73| °42] °22 0) | 05.) -02)| 01) M 
43 88} 1 ele 7d D30) TiO 93 75 59 | °44 33 24/)°17|°13| °33)B. 
Bologna = | = 3 ea al i || ea ae een eee 1 S| 0) 
‘Ol 06 21 49) +88] 1:24] 1°47] 1°49 | 1°32 | 1:04 74| °48) °28/°15|) °14) M. 
‘40 46 97) 1:06] 1:05] -98 87 76) ‘64 | 53 43°35 8] “21 7 133 
: | | | if | 
| | 
Totals 10 8 Qi alin 4: WM BY i | 8} il il 2 1 | — 2 | O} 
. 4:00 | 9°78 | 13:07 | 13-09 | 11-33 | 9:03 | 6-81 | 4°87 | 3:27 | 2°08 | 1:24} -70| -38|/-19] -16]M. 
6°15 | 12°75 | 14°82 | 12°64 | 9°28 | 6-57 | 4°70 | 3°46 | 2°54 | 1-88 | 1:38 | 1°02 | °75 | 55 | 1°43 | B. 
| | aera bao Z = 


The following are the binomials for the 8 districts out of 16 which Mortara 


has selected. 
Reggio Calabria 10 ( *7842 + -+2158)ts8so9 


Foggia 10 ( 9609 + -0391)87" 
Siracusa 10 (1°3000 — °3000)-3# 
Potenza 10 (15500 — °5500)3!2 
Catanzaro 10 (2°7524 — 1°7524) 229% 
Salerno 10 (2:3510 — 1:3510)-27 
Cosenza 10 (25308 — 1:5308)-*# 
Bologna 10 (33161 — 23161)" 
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Examining these we see that there are only two in which g and n are positive 
and only one in q is small and positive and n moderately large. The probable 
error of g for 10 observations on the assumption that n is very large and q very 
small is + ‘3016 and is quite inconsistent with the last four districts being samples 
from exponentially distributed frequencies. The other four districts may or may 
not belong to such frequencies—the data are wholly madequate to determine 
whether they do or not. Reggio Calabria and Foggia have the lowest Q's, 
ie. 09 and 1:0. But that six districts out of an already selected eight give 
negative q and a seventh a relative large g and small n suggests the inapplicability 
of the hypothesis adopted. If we seek for “ goodness of fit” of the totals, we find: 


Binomial Exponential 
y? = 2512 47-92 
P0336 ‘0000 


Thus the odds against the binomial system are 28 to 1, but the odds against 
the exponential are enormous. It does not seem possible to justify the treatment 
of such data by the use of the Poisson-Exponential. 


Let us turn to a second of Mortara’s illustrations, that of deaths from small- 
pox. He rejects first six out of the 17 districts, the remaining ten are given in 
Table XIII. The districts give the following binomials: 


Venezia 10( 9500+ 0500) 

Bologna 10( 9889+ 0111)" 

Treviso 10 ( 2:2000 — 1:2000)-*8 
Pavia 10( 1:8000 — 8000) 
Caghari 10( 45190 — 3:5190)-> 
Padova 10 ( 36833 — 2°6833)-%# 
Verona 10( 56000 — 4:6000)-™" 
Brescia 10 (O97 20 — B8i0i 2h) 
Bergamo 10 ( 23821 — 1:3821)-7=9 
Catanzaro 10 (156128 — 14-6128)~ 76 
Vicenza 10 ( 34854 —  2°4854)-Ve97 


Out of the eleven cases only two give g small and positive; not a single one 
gives for g anything like the chance of a death from small-pox in the district, nor 
for n anything like the population of the district. There is an increasing divergence 
from the positive binomial as Mortara’s Q’ increases in value. We see that in nine 
cases, however, a negative binomial not the exponential is required to describe the 
frequencies. The probable error of gq, for insignificant q is as before + 3016, and 
therefore it is improbable that g is zero in at least 9 out of these 11 districts. 


Examining the totals we find 
Binomial Exponential 
v2 = 9°64 * 570°79 
Oi ‘000,000 
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TABLE XIII. 
Deaths from Small-powx (1900—1909). 
: fy | 12 or 
0 1 2 8 4 5 6 : g er ea lee | more 
Venezia 4 5 = 1 7 || le Observed | 
4:49 | 3°60] 1°44 38 08 01) -- — = — —|— — | Mortara 
4°40} 3°71} 1°46 36) 06) ‘01 -—— | Binomial 
Bologna 4 4 1 1 — —|/|— —}|} — |90. 
4:07 | 3°66} 1°65| 49 ear Nfs ee: ©) een |r| ean cn | i ee IVT 
4:04} 3°68} 1°65 “49 ‘ll 02 Ol; — |) — B. 
Treviso 5 3 7 pe == 1 | | 0! 
3°68 | 3°68} 1°84 61 15| :03 ‘OL| - - M. 
Halls 236) 1-18 | ‘61 By ON ‘09 ‘05 03) ‘Ol; — | — = B. 
Pavia 4 3 el eee fee lL catty tea 0. 
SO) 382624) 2:17 87 26 ‘06 Ol = | = M. 
4:14] 2°76 e538) 79 40 19 09 O05 02 01} ‘01 |) — | = B. 
Cagliari 5 1 1 1 SV es eel aes peo ee eee | a ea ee eS 
OS Ocou eouiOln woo “99 42 IL) 04 OL y= —}— = M. 
4:07] 1°89} 1°17 79| 55! -39| -28| -21| -15| -11/:08|-°06/ -25 |B. 
Padova 3 3 _ 2 = 1 a |e == = 1/— = 7 |KO): 
‘91 | 2°18) 2°61} 2°09] 1°25 ‘60 24 08 03 Ol}; — | — — M. 
o2'| 2°03) 1°40 98 MON 250 36 26 19 13°] “10 7 16 | B. 
Verona 4 3 — 1 = = 1 ome ee oe |e | i O. 
OM Ql Sei :6la|) 22098) 125 ‘60 | +24) -08 03 ol) — J] — — M. 
4:07 | 1°74] 1:09 ‘75 4 40 | °31 733 18 14 | -11 | ‘09 Fata) || 15%, 
Brescia 2 3 2 2) = == = = a ee | 13 0) 
ES alan le QO 2 Oil) || a2, 1°82 | 1°20] ‘66 31 33 05 | 02 | — — M. 
4°99) 1°42 87 6 47 coll 30 24 20 17 | *14 | °12 79 |B. 
Bergamo 2 — 2 2 — | 1 — 1 1 1 }/—/—] — |9. 
*20 79)| 1°54) 2:00)) 1°95 1°52 99 5)5) 27 12 | 04! -02 ‘Ol | M. 
Hotere dol 1:57) 1:46) 1723 98 74 54 38 he | Altsy | aly 24 |B. 
| 
Catanzaro 3 3 1 1 1 |g en |e | 1* | 0. 
*20 “ON 1254.1 2:00)) 5951) 1:52 "99 YD) ||) 47/ 12 | :04 | 02 ‘O1 M. 
4°80 |} 1:20 el 50 38 ill 25 21 18 LG 4 | el2s 04s 1B: 
Vicenza 3 = 1 1 1 1 = 1 1 = — | — 1 O. 
‘17 GSae eso alee 1:95 | 1°60 | 1:09 BS} 15 | 706 | °02) -O1 M. 
Qe SOs da? 1eQBnle LeODie 282, 65 51 39 30 | SBA PI | °48 B. 
lias ae | 
Totals Somos, ie i.| 2 2 CRden No Wo sor ery) wae. Os 
19°24 | 24°97 | 21°50 | 16°54 | 11°76 | 7°58 | 4°38 | 2:25 | 1°07 ‘46 | °16 | -06 03. «| M. 
40°25 | 23°70 | 14:05 | 8°58] 5°78 | 4°16 | 3:08 | 2 L254 ed 59 Olles7ios eo soln b 
| 


Brescia, and 1 at 27 in case of Catanzaro, if the means were to agree with those given by Mortara. 


* 1 at ‘12 or more’ in cases of Brescia and Catanzaro was found to signify 1 at 20 in the case of 
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In other words the binomials give a reasonable total fit, the exponentials a 
practically impossible one. 


But there is another question to be asked in such series as those of Mortara: 
What justification is there in cutting off at 10 cases, say of murder? <A province 
may have a million inhabitants and, perhaps, 40 murders occur in a year*. Hence 
the binomial is for ten year returns 


10 24,999 il 1,000,000 
x eecce a saat) 


but this is as close as anything can be desired to the exponential series. It may 
be reasonable to apply a separate series to districts giving 4°2 and 36°6 murders 
per annum respectively, but it is difficult to see why the latter district should be 
altogether excluded from treatment. If the theory of the binomial be applicable 
at all, then it applies practically as well to districts with 40 murders as to districts 
with 4; for, we need no indefinitely small g to get a closely exponential series. 
If we take the case of deaths by murder, Mortara has retained only 6 out of 16 
provinces, yet his criterion @’ (see his Table, p. 51) is not more divergent from 
unity for the rejected provinces than for those retained ; the binomials are indeed 


Reggio Treviso 10( ‘7000 +3000)" 
Venezia 10( °5619 + °4381)s" 
Vicenza 10( 9571 + 0429)842! 
Padova 10 ( 4774 + 5226)» 
Pavia 10 (1'8162 — -8162) °° 
Bergamo 10( ‘8857 +1143) 


only one of which gives qg small and positive and n large. 


The mean Q for the retained provinces is ‘967 with a range from ‘7 to 14 and 
for the rejected 1:03 with a range from ‘8 to 1:4. Even if—which is not the case 
—the probability of an individual being murdered were too great for the ex- 
ponential, it ought to follow the binomial, but this, as a rule, it does not do, unless 
we give some wholly new interpretations to g and 7; the actual values render the 
theory of the binomial as stated inapplicable. 


(10) Mortara’s Criterion. 
As a matter of fact the only test of whether an exponential will legitimately 


fit a given series or not is to determine the binomial (p+q)”" and ascertain 
whether p is slightly less than unity. But: 


p = npgq/ng 
_ (Standard Deviation) 
7 Mean 


* We assume that each individual is equally likely to be murdered. But if there be a graduated 
probability for murder throughout the community, what right have we to apply Poisson’s series at all? 
The essential basis of the application—equal chance of each individual—is wanting. 
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Now if m, be the number of deaths, say, occurring in any year and there be 
1 years under consideration, then: 
S? (ms — nq 
l : > 


(Standard Deviation)? = 
or, if we use the form preferred by Bortkewitsch* 


8S? (ms — nq)? 
eet ee 
S? (ms, — nq) 

(l—1)nq © 

This in other notation is Mortara’s Q”, the only criterion he’ actually uses 

provided by his equation (17 ter), p. 18. Thus his Q’, which he says must not 
differ much from 1, is only /p, and it would be better to use p—which has a 
direct physical meaning—than Mortara’s Q =,/p. Clearly Mortara’s somewhat 
elaborate process of deducing Q’, does not amount to more than saying: Fit a point 
binomial and test if p is slightly less than unity. We contend that it is best 
straight otf to fit the binomial. 


Hence : p= 


It is true that Mortara does not reach his Q”, our p, by the simple process of 
asking whether the binomial is one with a positive probability less than unity. 
He endeavours to obtain it by considering whether there is “lumpiness” in the 
observations. But it seems to us clearer and briefer to ask: Are the contributory 
cause-groups independent as in teetotum spinning? If so, the data will fit a true 
binomial and p will of necessity be a positive quantity less than unity. If they 
are not of this character then p must of necessity be greater than unity. It is of 
interest to see how Mortara’s test of dependence of contributory cause groups 


2 


leads to a criterion, but he actually only gets his Q”, Le. our binomial p after 


2 


a series of hypotheses which much limit, and that in no very obvious manner, 


* The use of NE or en in the value of the standard deviation when l is small has been several 
times discussed. It may be dealt with as follows: The probable errors of a mean as deduced by the 
two processes are 

E=-67449 . o/a/1, 


and E’=-67449 .o/,/t-1, 


now B= 61449 6] JT(1 +5) +-~- ] 


1 
= -67449 +. (« eee ot. ) 
wl Not OE 
1 
may and —— is less and often much less than ‘67449. 
wel /21 
Hence if we only know o from the observations themselves, and this is the usual case, we have: 
1 / 
2s gh 
Jt 
where o’ differ from o by a quantity usually far less than the probable error of ¢. In other words the 
refinement of using H’ for F is idle having regard to the accuracy of our observations; and the form 
used by Bortkewitsch and Mortara with ,/1—1 for nibs of no importance. 


Now the probable error of o is °67449 


E’ = -67449 
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the nature of those contributory causes groups. Of course if their dependence 
were of the nature of successive draws from a pack, then the result would be 
a hypergeometrical series and Q? would have no physical meaning for the series 
at all. 

(11) We will deal with one further illustration out of many considered by 
Mortara which are of like character. In the case of Marriages of Uncle and Niece 
(see Table XIV, p. 65), where the distribution of Q’s is the most favourable 
for his theory, the binomials are 


Reggio Marche 10( "7000 + -3000)'° 


Umbria 10( :9000 + -+1000)°° 
Basilicata 10 (14000 — -4000)7* 
Sardegna 10( °44545 + °55455)198% 
Emilia 10( 9818 + -0182)2020 
Abruzzi 10( 8429 + °1571)788 
Lazio 10 (12548 — -2548)—12 1646 
Puglie 1OGeS =o ace 
Veneto 10 (1:34.44 — +34.44.)~1'S064 
Toscana 10:(2:2667 A266 7) 28" 
Calabria 10 (13584 — -8584)—2#°88 


of which only one (Emilia) approaches the conditions for an exponential distribu- 
tion. If we test the totals at the foot of Table XIV, we find the result much to the 
advantage of the binomial, for which P = ‘902 as against ‘714 for the exponential. 


(12) On Mortara’s own showing nearly all the Qs of his numerous series are 
greater than unity, and very few of the binomials are positive. If we consider the 
distribution of Q's, given in his work omitting Table 13 (Deaths from Malaria) we 
find a range from ‘5 to 3°6 with a mean Q at 

1:2565 + 0847, 
while for the distribution of all the p’s in the binomials we have determined, we 
find a range from ‘4 to 15°6 with a mean p at 2°5655 + ‘3817. 


These results are sufficient to show that there is no real distribution of p round 
the value unity but the binomials have a distinct tendency to be negative. 


(13) But the whole theory of Poisson’s exponential law in the hands of Bortke- 
witsch and Mortara appears essentially vague. The binomial is built up on the 
assumption of the repetition m times of a number of independent events, of which 
the chance of occurrence is identical and equal to g. The population is n and the 
chance of occurrence q in the case of each individual. The mean frequency of 
occurrence is ng. But if g be very small we have seen that the series 1s 


—m [1 , m me 
e ppnow nt soy v8 ; 


i —— 
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TABLE XIV. 
Marriages of Uncle and Niece (1900—1909). 


: as 
0 1 ze | 8 pe Nae |e oer |e om ip it | 22 | 23 | 14 15 | 18% 
| | over 
er 
Marche O.| 7 3 = = | | 
M.| 7-41] 2-22] -33] -04 | | | | 
Bale iz 2 | 
Umbria 0. 6 3 1 — — | | 
M 6:06 | 3°03 s(oa wld 02 
B. | 5°90} 3:28 ‘73 08 | -— 
BasiicamOnime. | o3 | — | 1 | — | ; | | | | | 
M 5°49 | 3:29 99 ‘20 03 | | 
B. | 6°04] 2°59 92 31 10 03 Ol. | 
Sardegna O. 2 5 3 = - 
M.| 3°33] 3°66) 2°01 74 20) °05 O1 
Pelpeccores-o6 | 3:03) = | — | — | — 
| | 
Emilia =O. 1 3 2 Zee 1 — 
M Il-l1l| 2°44) 2°68} 1:97] 1°08} °48]) ‘18 05) -Ol 
B 1:09] 2°48} 2°70] 1:98 | 1:08 47| ‘17 05 Ol 
Abruzzi ©, || — SUF whee ail 3 | 2 _- 1 — | — | 
M 61) 1°70) 2°38) 2:23] 1°56 8 Se G 06 02 
B Se elon mcd S|) 2°43 | 1°68 87 34} *11 03) — 
isazion ©. | 1 eal a2 3) — | 2}/—) 1] —) — 
M ADM eA One Dele 224 173!) 1:07 5D 25; 10} ‘03 OL 
B 63} 1°56! 2°09} 2:00} 1°54) 1°01 59 31 14} -06 03 O01} ‘O01 
Puglie O.| — 30 al 2 nh peal 1 72 ae bbe |e fee ae | 
M 27 FOS) wake) Loi U38 | “835 427) =19) 08 03 01 | — | 
B Sone lesOny We77 | 1-80 | 1538 |1:14| -77| °49) 29] +16) +10) :05] :03 | ‘Ol | ‘OL 
| 
Veneto O. 1 = 1 l 3 1 i a 2 | | 
M. 11 SHOP aoa OOM le OO! Ts fn) 28 82) “46 23 10 OA 025-01) 
Bs 21 ‘70 | 1°26] 1°62] 1°67 46 | 1°13 TS Mh > aap 30) 17 09 | 05 | °02 | -O1 | O01 
Toscana O. _ =e 1 2 2 2 1 Le — | — 1 
M. O04 24 66) 1:19) 1:60] 1°73 / 1°56) 1:20} -81 49} °26 13 | 06} ‘02 | 01 | — 
B. 31 SM lsOvaleelaton al 2ifal elie) 1-00 83 | °65 5O 37 27 | 19 | 13} 09 | 06] :10 
| 
Calabria O = - 2 2 1 1 1 ented tg | —| 1 
M — Ol 05:; “16. 36 4) 94] 1:20 | 1°33] 1:32] 1:17 | :95) -70| 48 | -31 | -18] -20 | 
B 00 O03 wi; 26 48 73 96 | 1:16 Deal OL C7 6Ou plaleon I 2a. a6 
: | ea | 
Totals O. 24 24 1} || ile 8 9 5 5 3 it 1 it = |) SE ee al 
SS M. | 24°88 | 19°47 | 14:93 | 12°72 | 10°39 | 7:93 | 5°76 | 4°10 | 2°96 | 2-17] 1°57] 1°13 | -78| -51 | -32 | -18] -20 
\ B. | 24°22 | 22°16 | 16°46 | 11°73 | 9°35 | 6°88 | 4°98 | 3°74 | 2°84 | 2°19 | 1°71 | 1:29 | -97 67 | 48 | °32] -26 
| 


} 
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from which x has disappeared, and in this exponential we have seen that 
Bortkewitsch and Mortara suppose m small, ic. 10 or under. We have seen 
that there is no reason why m should be absolutely small, and that the name 
given by Bortkewitsch to the Poisson-Exponential—ie. the “Law of Small 
Numbers ”—is misleading. But supposing the mean occurrence m to be small, 
it by no means follows that g need be small and n finite. For if g="2 and n=4, 
m would be “small ”—and the sort of small number with which our authors deal, 
but the mere fact that the mean frequency of occurrence was 2 would not justify 
our using the Poisson-Exponential for 
(Golam ey) 

The fact is that when our authors speak, of the deaths in a Prussian Army 
corps from the kick of a horse, or the suicides of schoolgirls, or the deaths from 
chronic alcoholism as being “small,” they really mean small as compared with the 
number of persons exposed to risk. They had probably in mind all the men in 
the army corps, all school-girls or all individuals liable to death in the towns 
considered. But are all men in the army corps,—or only the cavalry, the artillery, 
etc..—equally liable to death from the kick of a horse? Is every school-girl equally 
hable to commit suicide or only a very few morbid and unhealthy minded girls? 
Is every individual equally liable to die of chronic alcoholism, or only perhaps the 
10 or 12 confirmed and aged drunkards in a town? The moment we realise these 
doubts, what is the population n to be considered? It is not m being small, but 
the smallness of m/n that leads us to believe that the binomial may have passed into 
an exponential. But if only six school-girls per year in a community are in the 
least likely to commit suicide, what is the justification for the “law of small 
numbers,” if the average number of suicides be 65? Further, if we pass to even 
a large community in which the tendency to commit suicide is graded—a very 
probable state of affairs—m might be small and n large, and yet since q is not 
constant, the binomial and its exponential limit would not be applicable ; and this 
non-applicability would not depend on “lumpiness””—i.e. contagion or example in 
occurrence. Thus the probability might be: 


(Pit h) (P+ G2) (Ps + Gs) +++ (Put Yn) 
with all the p’s independent (as in spinning differently divided teetotums) and not 
correlated (as they would be in drawing successive non-returned cards from a pack). 
It would seem therefore that a priort we should not expect the conditions for the 
exponential to be fulfilled in most of the cases selected by Bortkewitsch and 
Mortara, although with perfect mixing we might expect it in the cases cited 
by “Student.” 


(14) In order to test this point on adequate numbers, the ages at death of all 
persons dying over 70 years of age were extracted for a period of three complete 
years from the notices of death in the 7imes newspaper for the years 1910—1912: 
see Table XV. These announcements of death are those of individuals in a fairly 
limited class, which may be considered stable in numbers for these three years. 
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68 On the Poisson Law of Small Numbers 


Table XVI shows that the announcements of deaths over 70 years of age only 
amount to 3°74 per day for males and 3°52 for females. These are certainly “small 
numbers,” but “small” with regard to what? Are we to consider n as the number 
of the population which embraces, (i) all the individuals of the limited classes of 
the same range of ages as the defunct, (11) all the individuals announced as dead 
on the same day, (iii) all the individuals of whatever ages of the class which 
announces deaths in the Times? Or, should we refer to all the individuals in the 
community of that range of ages, or the whole community at large, 1e. the chance 
that in a population of so many millions an individual over 70 or 80 as the case 
may be will die and have their death announced in the Times newspaper? Well, 
it really does not matter, because if for any one or all of these populations the 


binomial (p + q)” applied, we should get if g were small and n large, the Poisson 
series 


Cm @ +m+ ae nag + 4 
Ziel ; 

and this quite regardless of the size of n. If therefore we did find a series in 
which g was very small and n large, we might not be able to say to which, if any 
of the above populations n applied. On the other hand the mere fact that m is 
small is no justification for the use of the “law of small numbers” as is sometimes 
implied. If it be argued that the small number of people who die over 80 and 
have their names recorded in the Times are drawn from a small population, we 
reply so it may be argued are the school children who commit suicide, the uncles 
who feel any inclination to marry their nieces, or the men liable to die of chronic 
alcoholism ; and we can in the case of the announcement of deaths test the values 
of g and n on fairly adequate numbers. As a matter of fact we do not know, in 
attempting to apply the Poisson formula, what is the population from which we 
are drawing our individuals, and the justification of the Poisson formula lies only 
in showing that there actually does exist a binomial for which qg is small and 
n large. We might imagine that as we got to the higher ages practically every 
person of that age would die, or that in our notation q would be 1 nearly and p be 
a very small quantity ; thus an approach might be made to the Poisson-Exponential. 
But the approach to the Poisson-Exponential arises not through q approaching 
unity but from q becoming very small. Nor again in the lower age groups do we 
find ourselves left with a positive binomial. 


In all cases except women over 90 years of age, we find that a negative 
binomial best fits the observations. Even in the case of the announcements of 
deaths of women over 90 years, we find that the approach of the binomial to the 
Poisson exponential depends on 


/ i: 53°3333 
(1 a 553555) 


being measured with sufficient approximation by e = 2°71828. But 
(1:01875)?°8 = 269323, 
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and is therefore not a very close approximation, a result shown when we use 
a binomial by the substantial improvement in the measure P of “goodness of 
fit.” Even in this case we are not prepared to say what is the population for 
which the g = ‘01875 in the case of these announcements of deaths of women over 
90 years of age. It can scarcely be that there are only 29 women over 90 years 


TABLE XVI. 
Constants for Deaths of Aged. 
Men. 
| | | 
Probable _ Probable lee omiaal Expo- 
Age over | p q Error n Error m , P a | nential 
| of q | of n P 
| 70 years... | 112965 | — 12965 | + 03314 |—28°8747 + 7°3734| 3°7436| °1355 0045 
| 80 years... | 1°12152 | — 12152 | + 03349 |—14:0703 + 3°8704 | 1°7099 | °9358 | *1129 
| 85 years ... | 1:01903 | — 01903 | + :02902 | —43:2996 + 67°5797 | °8289) “9737 | “9715 
90 years... | 1°00654 | — :00654 | + 02934 | — 42°8498 |+192°3069 | -2801 6741 | 6672 | 
| | . | | 
Women. 
= , = | 
| Probable Probable | | Bi al Expo- | 
Age over Dp qd | Hrror n Error m eee nential 
of q | of n | ee 
| 
ei. Paice: 
70 years ... | 1°34012 | -- 34012 | + 04161 |—10°3522 | + 1°2307 | 35210) -8084 | -0000 
80 years... 1°20770 | - -20770 | + °03294 |—10°4400] + 1°8309 | 2°1569 -9686 70018 
85 years ... | 1°14507 | — °14507 | + 03077 |— 8°1447) + 1°9627 | 1:1816| ‘9860 | -1062 
90 years ...| *98125 | +°01875 | + ‘02779 | + 29-0573 +43:0634 | 5447| ‘9848 | -8116 
| | | 


of age living in the country, whose deaths are likely to be announced in the Times 
when they occur. Further the probable error of qg is such that actually this case 
might equally well be a random sample from material following a negative 
binomial. Analysing our material we see that our first two cases of males and 
the first three of females are such that they could not possibly be random samples 
from positive binomials, the probable errors of q are too small. Next, seven cases 
out of the eight do give actually negative binomials and the eighth might, having 
regard to its probable errors, well be a negative binomial. Thus although our 
daily occurrences are certainly in Bortkewitsch and Mortara’s sense “small numbers,” 
they give no support to the use of a Poisson-Exponential. 


If it be said that these “small numbers” differ in character from those used 
by our authors, the reply must be: we know in none of these cases the real 
population from which deaths are to be considered as drawn. The chances of 
death are certainly graduated with age, but the chances of suicide are graduated 
with temperament, and the same is true of alcoholism, or again the chance of 
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death by accident is graduated with occupation. At any rate until those who 
support the use of the “law of small numbers” demonstrate its application on 
material, where the probable errors are sufficiently small for us to measure the true 
value of gq and n, no advance can be made. Nor until we have clear ideas of the 
population in which the chance is q, is it possible to assert that it may be used 
for the suicides of school children, and the marriage of uncle and niece, and must 
not be used for the deaths of aged people, which certainly occur in “smaller” 
numbers. 


In the illustrations of deaths we have taken, certainly the Poisson-Exponential 
is not the rule, although the distributions appear to approach it, as towards a limit, 
when the number of deaths approach zero. But our data which show the rule of 
the negative binomial appear to show it in no more marked manner than much of 
the data selected by Mortara himself indicate the negative binomial, although owing 
to the sparsity of his material his results are far more erratic and unreliable. Nor 
is Bortkewitsch much behind Mortara in the evidence he produces for a negative 
binomial being as reasonable a description—possibly owing to inherent lumpiness— 
as a positive binomial of these “small number” frequencies. 


(15) Conclusions. 


(a) The Poisson-Exponential gives a fairly reasonable method of dealing with 
the probable deviations of small sub-frequencies in the case of random sampling. 
When the average value of a sub-frequency is not more than 3°/, of a population, 
then Poisson’s formula suffices in most practical cases to determine the range of 
error likely to be made. Tables are given to assist its use. 


(b) The application of the Poisson-Exponential to various data by Bortkewitsch 
and Mortara has hardly been justified by those writers, for they have not tested 
whether the probability q is small and positive and the power n large and positive 
in the cases considered by them. When this is actually done, it is found that 
their hypotheses, having regard to the probable errors of q and n, are largely 
unjustified in the case of their illustrations. Even in such cases where it is 
justified, a binomial gives a better result as measured by the test for goodness 


of fit. 


(c) Negative binomials repeatedly occur and give just as good fits, where 
they occur, as positive binomials. In the illustrations taken by Mortara, the 
frequency 10 used is so small that it is not possible to assert that either positive 
or negative binomials are demanded by the data. Still the average p of his results 
is very significantly in excess of unity. 


(d) Mortara like Bortkewitsch cuts out of his data straight off all districts 
with, on the average, more than 10 cases in the year. But the g obtained from 
20, 40, or even 100 cases in a population of 100,000 is a small g in the sense that 
the resulting binomial is adequately expressed by a Poisson-Exponential. There 
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appears to be no valid reason for such a procedure, except the experience that 
many such cases actually give negative binomials*. It seems to us theoretically 
unjustifiable to apply the exponential to 8 cases say in a district of 100,000, and 
not apply it to 12 cases in a district of 200,000. Actually p may be 1:4 in the 
first case and only 09 in the second. 


(e) We consider that the reasonable method in every case is not to start with 
the Poisson-Exponential, which screens the truth or falsity of the a@ prior 
hypotheses, but to fit a binomial regardless of the magnitude of p. The fact that 
quite as good fits are obtained with negative as with positive binomials suggests 
that a new interpretation of these cases of “negative probability” is requisite. 
Several cases of the interrelation of “contributory cause groups” which provide 
a series represented by a negative binomial (p—q)~” have been recognised f. 
A general interpretation based on a very simple conception seems needed for 
these demographic cases in which the law of small numbers appears far more often 
to correspond to a negative than to a positive binomial. 


This paper was worked out in the Biometric Laboratory, and I have to thank 
Professor Karl Pearson for his aid at various stages. 


* Can we cite in addition perhaps, the fact that existing tables of m*e~”"/x! do not extend beyond 
m=10? 
+ Pearson, Biometrika, Vol. tv. p. 208. 


THE RELATIONSHIP BETWEEN THE WEIGHT OF THE 
SEED PLANTED AND THE CHARACTERISTICS OF 
THE PLANT PRODUCED air 
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By J. ARTHUR HARRIS, Ph.D., Carnegie Institution of Washington, U.S.A. 


I. Inrropucrory REMARKS. 


1. In Biometrika, Vol. rx. pp. 11—21, March 1913, were published constants 
showing the relationship between the weight of the seed planted and the number 
of pods on the plants produced in twenty experimentally grown series of Phaseolus 
vulgaris. From the economic view point, number of pods is the most important 
character which could have been chosen, total weight of seed matured only 
excepted. But to the student of morphogenesis, or of the physiology of seed 
production, other characters are of equal interest, while the comparison of the 
correlations for various features must yield results of significance. 


The purpose of the present communication is the presentation of the constants 
measuring the influence of the weight of the seed planted upon the number of 
ovules formed and the number of seeds developing in the pods of the matured 
plant. 

These various relationships have now been worked out for a relatively large 
bulk of material. Altogether there are 29 individual series belonging to 5 
varieties, involving 17,953 plants, from which 119,192 determinations of the 
number of ovules and seeds per pod have been made. The reply to the possible 
suggestion that the expenditure of effort in the collection-and analysis of such 
masses of data is quite unjustifiable is twofold. First, a major portion of the 
labour involved was necessary for investigations not touched upon here. Secondly, 
there are many problems of morphogenesis and physiology which can only be 
solved by the amassing of large series of accurately determined biometric constants 
which when sufficiently numerous may themselves be the materials for statistical 
analysis. The data here contained are recorded in partial fulfilment of such 
requirements for certain definite morphological and physiological problems. 

The present paper is limited strictly to matters of fact; general discussions are 
reserved until further data—much of which is already available in a raw state— 
are reduced. 
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MATERIALS. 


The first paper may be consulted for details not entered here. 
analysed are drawn in part from the series already considered for the relationship 


between weight planted and number of pods produced. 
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The data 


In addition to the White 


Flageolet, Navy and Ne Plus Ultra varieties already treated, several lots of 


Burpee’s Stringless and two of Golden Wax are available. 


III. 


ANALYSIS OF DATA. 


2. Data for Number of Ovules and Seeds per Pod. 


Tables III—VI, similar to those of the preceding paper, give in a condensed 
form the data for the correlations discussed. 


Table I* gives the correlations 


TABLE I. Correlation and Partial Correlation Coefficients. 
| | | 
Namber coro: Nummberc! Correlation, Partial Correlation, Partial 
F eight Weight | pare Weight | : 
Series of end Pos of Pods | eadiOvdlese| Correlation, and, Seeds Correlation, 
Plants Twp Examined | no mane ng pws 
| | | 
LL 1141 |--008+°020; 8043 026+°008; ‘027+°008 —°013 +4008 | — 013 + :008 | 
LG 182 066 + 050 806 1534023 | +140+°023 —-100+°024 |—:103 + 024 | 
GG 750 |-°368+°021) 6310 018+°008' -029+'008 ‘004+°008} ‘0164-008 
GGH 583 208 + 027 5251 045+°010) 01947009 024+ °010 | — 004+ ‘009 
GGH2 499 176+ 029 3502 0934°011;) -083+°011 063+°011} 049+-011 
GGHH 396 193 + 033 2656 — 022+ °013 | — -042+°013 —:-029+-013 |— 048+ 013 
GGD 514 "159 + 039 1438 ‘1074018; ‘089+4°'018 ‘O71+°018} -068+°018 
| GGD2 449 215 + 030 1227 0444°019;) °018+°019 ‘079+°019] ‘0624-019 
| GGDD 342 137+ 036 | 807 1014023} -092+°024, -089+°024} -076+°024 
A 1484 177+:017| 14029 010 + :006 | —"039+°006 = 007 + :006 | — 054 + 006 
| HHA 1271 1454-019) 11230 — ‘000 + 006 | — -030+°006 = 016 + 006 | —-014 + 006 
HD 1416 "129+°018| 5581 — 044+ °009 | — 067+ :009 — 049+ 009 | — 052 +:009 | 
| HDD 1204 121+°019;} 5449 — 029 +009 | — 065 +4°009 —-010+:009 |— 030 + ‘009 
DD 513 282 + 027 1827 098+ °016) -009+°016 0504-016} -008+-016 
| DDD 459 215 +°030 2018 0444°015|} ‘0004°015) 0464°015} ‘006+4°015 
| DH 670 258 + 024 5955 075 +009 | — ‘005+°009 = ‘076 +009 | — (013 + 009 
| DHH 565 152+°028, 5019 0454°010} *008+°010, ‘011 +°010 | —-025+°010 
OSC 530 150+°029| 2569 059+ °013} 032+ °013 | 031+:°013} -024+°013 
USS 680 155+:025| 6605 023 +008 |—-000+°008) -041+°008| -024+ 008 
OSH 361 129+ °035 3.406 ‘032+ :012| 0014-012} 0374-012] -020+-012 
| USHH 224 1434-044 1743 112+°016} ‘098+°016 ‘011 +016 |— -004+-016 
USD 312 195 +037 | 802 127+:023/ °098+:024; -071+:024| -067+-024 
| USDD 237 241 +-041 | 851 2384022} +175+°023|) °1314°023] -090+:023 
FSC 586 147 +027 2876 047+°013] -017+°013|} :089+°012} -073+°013 
FSS 868 098 + 023 7809 021+°008) *001+'008) -026+4-:008; -004+4-008 
FSH 475 "100 +031 4541 049 +°010} -018+°010 —-045+°010 |— -073 +:010 
FSHH 427 121 + 032 3837 015+°011 /—-013+°011  -0404°011) ‘0174-011 
FSD 428 130 + ‘032 1449 "060 + 018 | — 027+°018 —-019+-018 = +036 +018 
FSDD 387 144+ 034 1556 0B7+°017) 013 +017 | ‘047+ °017) 024+ 017 
| 


* The weight of the seed planted was weighted with the number of pods counted. 
Sheppard’s correction was used for seed 
weight, but not for the integral variates ovules per pod or seeds per pod, 


differ slightly from those of Table II of the first paper. 


Biometrika x 


Thus w and o, 


10 
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between weight of seed planted and ovules per pod, 7,., and between weight 

8 Pp per } 8 
planted and number of seeds matured per pod, 7s. The partial correlation 
coefficients, 3 


i= Li wo me wp Pr. po ae Tws — Trp pss 
p' wo P = ) DWS: = 5 2 5) 
v1- “wp Vl=r r ‘po V1 — Pop V1 — ns 


showing the correlation for weight (w) and ovules (0) and weight and seeds (s) for 
constant numbers of pods (p) per plant are also given. These require in addition 
to the correlations here given 7), 7p). and r,s, the correlations between the number 
of pods per plant and the number of ovules and seeds in these pods. Values 
of 7», are available from the preceding paper (Biometrika, Vol. 1x. p. 21, Table 
VII) and from a supplementary table giving nine additional constants*. For the 
reader’s convenience these are reprinted in this table. The values of r,, and ry, 
will be published in connection with another problem. 


The probable errors have all been calculated on the basis of the number of 
pods examined as V. There is considerable question whether the actual number 
of seeds planted should not have been used instead; the degree of trustworthiness 
of a constant is perhaps not greater than is indicated by the lowest number of 
actual measurements (irrespective of the number of associated measures taken). 
The point is not of the greatest practical importance for the present case, since the 
number of series is so large that conclusions can be drawn from the run of the 
constants as a whole and too much weight need not be given to individual series. 


A glance at the table shows that the correlations are low throughout. The 
suggestion naturally arises that some of the extremely low values may be due to 
non-linear regression. The regression straight line equations and the results of 
Blakeman’s test} are given in Table II. Here 7, 7 and the straight line equation 
for the regression of ovules and sceds per pod on weight planted (in working units) 
are determined by the conventional formulae. The final two columns give the 


values of 
vil se Mee 
ue: Miele =i" 


when €=7?— 7? and xy, = 67449//N. 


All the straight lines are shown in Diagram 1. The empirical means are 
indicated in all of the cases where it can be done without confusion. The slope is 
very slight and the agreement of observed and predicted means not very close, 
especially near the ends of the range, where the number of observations is small. 
There is, however, no clear indication that a curve of a higher order would describe 
the results better than a straight line. This irregularity is precisely what is to be 
expected in cases of low correlation. 


* Harris, J. Arthur, ‘“‘An Illustration of the Influence of Substratum Heterogeneity upon Experi- 
mental Results.” Science, N. 8. Vol. xxxvi. pp. 345—346, 1913. 
| Blakeman, J., Biometrika, Vol. 1v. pp. 332—350, 1905, 
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TABLE II. 
Tests for Linearity of Regression. 
| | _ | 
Correlation, r, | Correlation-Ratio, | Regression | Blakeman’s | Blakeman’s 
Series and n, and Straight Line —_ Criterion, Criterion, 
Probable Error | Probable Error Equation Test Ad Test B 
x | 
a eel | 
For Ovules: | | | 
USS 0232 + °0083 0657 £0083 | -54230+4-0074 wv 3°720 | 1°688 
DHH 70445 + *0095 0788+ 0095 | 4:°9385+4°0257 w 3°431 1151 
USDD 2381 + 0218 "2978+ 0211 3°6886 + “LOOL w 11°096 2°102 
GG D2 0442 + 0192 1276+ ‘0189 4:7224+4 0137 w 3°152 1:967 
FSS 0209 + ‘0076 0403+ °0076 =| 5560640153 w 2°263 “754 
HH “0098 + :0057 "0661 + °0057 5°3600 + 0056 w 5°159 1°678 
For Seeds: | 
USS ‘0407 + ‘0083 0946+ 0082 | 3:5870+°0206w | 5"182 2351 
DHH 0111 + :0095 05414 °0095 | 4°1521+4°0106 wv 5573 1°869 
USDD 1313 + :0227 1932+ °0223 | 2°1840+-0940 w 8712 | 1°650 
GG D2 0794+ 0191 "1760 + ‘0187 2°4735 + 0846 w | 4181 | 2°529 
FSS ‘0261 + 0076 0499+ :°0076 =| -3°0712+ "0269 w | 2°793 931 
HH ‘0068 + *0057 0953+ 0057-42119 + 0058 w 8421 2°739 


Blakeman’s criterion has been applied in two ways, A and Bb. In the first the 
actual number of pods examined has been taken as V. In test B the number of 
seeds planted (not the weighted number) has been used in obtaining x,. If the 
first test be accepted as the proper one, it follows that regression cannot safely be 
regarded as linear. But there are two important points to be taken into account. 
The correlation ratio 7 depends upon the squares of the differences in means, hence 
it has always a positive value, which may be very substantial because of the errors 
of sampling when the number of individuals per array is small. Thus when r 
approaches zero 7 is limited by 7, the mean values of 7 for zero correlation*. 
Hence a test for linearity based on a comparison of 7 with a very low value of r 
may be misleading, Again, as pointed out above, the significance of both r and 7 
should perhaps be tested on the basis of the lowest number of measurements. If 
this be done, as it is in test B, there is found very little evidence for non-linear 
regression. Certainly, one. cannot possibly assert that the low values of 7, which 
is seen throughout these experiments, is due to the number of ovules (seeds) per 
pod at first becoming larger and then decreasing after a maximum is reached as 
one passes from the lowest to the highest grade of seed weight. 


The results of Table I are also shown graphically in Diagram 2. Here the 
relationships for weight of seed planted and number of pods on the plant developing 
are also indicated as a basis of comparison. The values of both ry. and r,s are in 
general conspicuously lower than the low values of 7,,. But very few of them 
drop below the zero bar; one is forced to the conclusion that there is a distinct 
though very slight correlation between weight and ovules and between weight 


and seeds. 
* See K. Pearson, Biometrika, Vol. vit. pp. 254—256, 1911. 
. 10—2 


76 Weight of Secd and Characteristics of Plant 


Consider in somewhat greater detail the signs and magnitudes of these 
correlations*, 

Of the 26 values of 7, only 4 are negative. The mean value of the 22 positive 
coefficients is +°0673; the mean of the 4 negative is — 0236; the mean of all 
(regarding signs) is +°0533. 

For the relationship between weights of seed planted and number of seed 
matured per pod, 7s, 21 constants are positive and 5 are negative. The mean of 
the positive coefficients is + 0502; the mean of the negative values is — ‘0303 ; 
for all 26 correlations the mean (regarding signs) is + ‘0348. 

Thus both correlations are (as is clear from the diagrams) unquestionably 
positive but very low. 

Apparently the relationship for weight and ovules is slightly closer than that 
for weight and seeds per pod, but the difference is too slight to justify any final 
conclusion. 

Consider now the question whether the observed correlations ry, Ts are to be 
regarded as direct biological relationships between the two variables w and o or w 
and s, or whether they are to be looked upon as merely necessary resultants of 
other interdependences. At present, the only other demonstrated correlation 
which might tend to bring about sensible values of 7», and rys is that between 
number of pods per plant and number of ovules formed and number of seeds 
developing per pod. Since number of pods per plant is known to be correlated 
with weight of seed planted, while both number of ovules and number of seeds per 
pod are correlated with number of pods per. plant, some correlation must be 
expected between weight planted and number of ovules and seeds per pod. If 
now the observed values of 7%. and r,s which are always small, are merely the 
necessary resultant of the relationships 7), 7po, ps, one would expect the partial 
correlation coefficients, ,?0, pws, to be sensibly zero. If these partial correlations 
are not sensibly zero, it can only mean that there is a direct (causal) relationship 
other than the one just considered between number of ovules (or seeds) and the 
weight of the seed planted. 


The partial correlations and the correlations are shown side by side in 
Diagrams 3 and 4. The lowering of the degree of interdependence between both 
weight and ovules and weight and seeds by the correction for number of pods per 
plant is clearly marked. In a number of cases in which the correlation coefficient 
is positive the partial correlation coefficient is negative. 


Thus only 4 of the 26 values of 7. are negative, while 9 of the partial cor- 
relation coefficients have the minus sign. In only 5 cases is ry, negative, but in 
11 of the series, the sign of 7s is negative. The mean values of the partial cor- 
relations are very close indeed to zero. Thus j7y.='0186 as compared with 
Two = 0533; pFws= "0099 as against 7, = 0348. 

* T have already shown (Science, N. 8. Vol. xxxvu1. pp. 345—346, 1913) that the LL, LG and GG 
series are open to question because of the lack of certain precautions in the cultures; while they are 


included in the table of fundamental constants to avoid any possible criticism of selection of series they 
will be left out of account in the following disctssions. 
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IV. RECAPITULATION. 


The facts presented in this paper and in the preceding studies justify the 
following conclusions. 


1. In Phaseolus vulgaris there is a sensible relationship between the weight 
of the seed planted and the number of pods on the plant developing from it. The 
correlation is always low, averaging only about °166, but under proper experimental 
conditions the coefficients have always been found to be positive. When experi- 
ments are not made with all necessary precautions substratum heterogeneity may 
completely obscure the influence of seed weight, reducing the correlation to 
practically zero or even bringing about a substantial negative correlation. 


2. There is also a significant positive correlation between the weight of the 
seed planted and the number of ovules and the number of seeds in the pods pro- 
duced by the plant developing from it. These correlations are so low that on 
relatively small samples negative values may be found. They average only about 
one-fifth to one-third the magnitude of the correlation for weight planted and pods 
per plant. 


The relationship for weight and ovules is numerically higher than that for 
weight and seed, but on the basis of the number of series now available the 
difference cannot be asserted to be significant. 


3. Morphogenetically and physiologically, the observed correlations between 
weight and ovules and weight and seeds are to be regarded as the resultant of two 
other correlations, namely, that between the weight of the seed planted and the 
number of pods per plant and that between the number of pods on the plant and 
the characteristics of these pods. This conclusion is based’on the fact that the 
partial correlation coefficient for weight of seed planted and number of ovules or 
seeds per pod for constant number of pods per plant is practically zero. 


CoLtp Sprina Harpor, N.Y. 
August 20, 1913. 


ON THE PROBABILITY THAT TWO INDEPENDENT DIS- 
TRIBUTIONS OF FREQUENCY ARE REALLY SAMPLES 
OF THE SAME POPULATION, WITH SPECIAL REFER- 
ENCE TO RECENT WORK ON THE IDENTITY OF 
TRYPANOSOME STRAINS 


By KARL PEARSON, E.RS. 


(1) In Biometrika, Vol. vin. p. 250, I discussed fully the mathematical 
process requisite for measuring the probability that two independent distributions 
of frequency are really samples of the same population. As far as I am aware this 
is the only complete theory of the subject which has been published. I believe it 
to be scientifically adequate, and it has already been applied to a large number of 
problems*. 

Before that paper was published, it had been usual to compare any constants of 
two frequency distributions together, and by a due consideration of their difference 
relative to the combination of their probable errors to determine the probability of 
the identity of those constants. This could be repeated for any number of corre- 
sponding constants, and if theoretical curves of frequency had been fitted, their 
divergence or correspondence measured by the divergence or correspondence of 
their complete series of constants. The method above referred to, however, as 
based on the general theory of sampling, calls for no hypothesis as to the general 
theory of frequency. It takes the observed distributions and measures the prob- 
ability that both are samples from a large population. The population may be 
homogeneous or heterogeneous; provided the samples are truly random samples 
we obtain a measure of the probability of their common origin. 

In the course of a long statistical experience I have learnt that it is wholly 
impossible to reach any safe conclusions as to the identity or non-identity of 
populations by any process of mere graphical comparison of frequency distributions. 


* In actual practice the x? test of ‘‘ goodness of fit’’ should always be made with not too fine group- 
ing at the terminals, especially when any group in the tails appears to be contributing largely to the 
total of y?. This point was recognised ab initio (Phil. Mag. Vol. u. p. 164), and has recently been 
re-emphasised by Edgworth, Journal R. Statistical Society, Vol. uxxvu. p. 198. 
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The distributions in appearance are wholly dependent on the choice cf scales 
and the eye alone cannot possibly make any measure of the degree of accordance, 
which will have scientific value. 


In the accompanying Diagram I. for example, we have the frequency distri- 
butions permille of two strains of trypanosomes, (aa) from a Donkey and (bb) from 
a Hartebeeste. These we are told are identical. Below (cc) and (dd) are given the 
frequency distributions of head-breadths for two races, Egyptian and English women, 
separated by 7000 years interval. These strains we know to be different, but the 
eye that judges (aa) and (bb) to be the “same” * might well suppose (cc) and (dd) to be 
also the same. Actually when we come to the quantitative measure of divergence, 
the probability that (aa) and (bb) are samples of the same thing is P < 000,000,1, 
while the probability that (cc) and (dd) are the same is P=‘001. In other words it 
is 10,000 times as likely that Egyptians of 6000 B.c. and the English of 1680 a.p. 
are the same strain as that the trypanosomes from the Hartebeeste and those 
from the Mzimba Donkey are of the same strain. Both may indeed be of the “same 
strain” if a sufficiently wide meaning be given to the term. But is such a racial 
resemblance as we find between the Prehistoric Egyptian woman and the English 
woman diluted 10,000 times what we understand in ordinary language by the “same 
strain”? All the mathematician can understand by “sameness of strain” is the 
identity which corresponds to random samples of the same population. Ifthe identity 
has been modified by a long evolutionary process, by markedly differential environ- 
ment or treatment, is it not better to have some measure of a scientific nature of 
the extent of the difference or of the sameness? The eye can never provide any 
judgment of value on such a point. Especially is this the case if the graphs 
represent percentages, as the degree of divergence is of course a function of the 
number employed to determine the percentages. A deviation of frequency by per- 
centages based upon samples of 200 might look to the eye absolutely like the 
deviation of frequency due to samples of 2000, but the scientific measure of the 
probability of sameness would be widely modified. 


That the reader should have evidence how excellent is the test, I have taken 
the cranial lengths (Flower’s measurement) of 67 female skulls dug up in Liverpool 
Street and compared them with the like lengths of 142 female skulls dug up in 
Church Lane, Whitechapel. It is possible that both these sets of crania formed 
part of the contents of plague pits, or there may be an interval in date of a century 
between them+. Diagram I bis shows the data arranged as percentage frequency 
curves. The x? for 17 groups proceeding by 2 mm. ranges = 19°38, giving P = ‘250, 
or once in four trials, if the material drawn from were the same, we should obtain 
pairs of samples more divergent than the pair recorded. In other words we can 
be confident that the Liverpoo! Street and Whitechapel crania represent persons 

* An attempt to define the word ‘‘sameness” as used by writers on trypanosome strains would 
doubtless serve a useful purpose, and emphasise the fact that we can only define ‘‘sameness” by appeal 


to the theory of sampling, or by the adoption of some quantitative measure of the grade of likeness, 
+ See Biometrika, Vol. 11. p. 191 and Vol. vy. p. 86. 
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of the same strain. That P= ‘25 and not, say, ‘85 may be merely a result of 
random sampling, or it may arise from some difference of period or social class. 


(2) In a long series of papers recently published in the Proceedings of the 
Royal Society, Section B, conclusions are reached as to the identity of various 
strains of trypanosomes. These conclusions are largely based on a comparison of 
graphs of the frequency percentages obtained by measurement of hundreds of 
trypanosomes. 

To some extent mean values are given for the different strains, but no argu- 
ments whatever can be based on them, for in no case has the probable error of the 
difference been calculated. Even if it had been calculated, this constant alone 
would not have sufficed to determine the sameness or difference of the strains. 
Further, the percentages of various forms in the strains are sometimes given; 
but again no attempt has been made to determine whether the differences of 
these percentages are or are not significant. It seems sufficient here to consider 
the far more valid test of the sameness or diversity of the frequency-distributions 
as a whole. 

I shall divide my investigation into four parts: 

(i) The probability of identity of the strains on the evidence presented in the 
reports of the Commission of the Royal Society, Nyasaland, 1912. 

(ii) The probability that the host or the animal in which the trypanosome is 
cultivated makes essential differences in the distributions of frequency. 
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(iii) The probability that the strains are alike after allowance has been made 
for the host. 


(iv) The nature of the heterogeneity which is statistically demonstrable in 
the bulk of trypanosome measurements. 


I should like before considering the material to indicate one or two very 
important points. I am not concerned here with the truth or error of the con- 
clusions drawn by Sir David Bruce and his collaborators. I am only concerned 
with the nature of the process by which they have drawn their inferences. That 
process consists in a measurement of the individual trypanosomes and an appeal 
to the statistics of these measurements—in short to what I should term biometric 
reasoning. There may well be other means of discussing the resemblances of the 
different strains of trypanosome,—either by microscopic examinations of diver- 
gencies in the life history of the different strains or by differentiation in their 
action on different hosts, or otherwise. But in the present case the appeal to statistics 
of measurement has been made. Drs Stepbens and Fantham in their paper on 
T. rhodesiense (R. S. Proc. Vol. 85, B. p. 227) actually term their work a “biometric 
study,” and the later papers of Sir David Bruce and others are no less “ biometric.” 
Now if an appeal be made to statistics, then by a statistical method alone can the 
answer be given. Further, that method must be the analysis of the modern fully 
equipped and highly trained statistician. Such a statistician, and he alone, can 
assert or deny on the basis of statistics the probability of any of these strains 
of trypanosomes being samples of the same population; he alone is in a position 
to judge the value of the evidence provided by the frequency distributions. If he 
finds substantial “divergence” where Sir David Bruce and _ his collaborators 
assert “sameness,” then either statistical theory is wrong, or Sir David Bruce 
understands by “sameness” something quite different from the “sameness” of the 
statistician, and something which cannot be judged by the methods of statistics, to 
which accordingly no appeal should have been made, or only an appeal after a long 
series of control experiments. The “sameness” postulated by Sir David Bruce is 
something quite incompatible with the “sameness” found by the statistician when 
he investigates two samples of 100 crania of the same race or two samples of 1000 
blood corpuscles of two series of frogs of the same race. It is what the statistician 
calls marked divergence and not sameness. If it be asserted that the extreme 
divergence actually existing between the strains of trypanosomes statistically dis- 
cussed is due to difference of individual host and not to difference of strain, it will 
be clear that the divergence and not the sameness ought to have come out of the 
statistical investigation, and then control investigations ought to have been made to 
explain that divergence by environmental or other differences. But this is & priori 
to assume the identity of the strains and a posteriori to seek an explanation of 
marked divergence deduced statistically, whereas in the actual papers this great 
divergence is assumed to be statistical sameness and this sameness used as an 
argument for identity of strains. The statistician coming to the data critically 
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does not of course assert dogmatically that any two strains are not of identical 
race. What he does assert is that no argument for the sameness of the strains can 
be based on the statistics provided; for these actually show wide divergence, and 
he asks if the strains are @ priori assumed to be “same,” for a full @ posteriort 
examination of the sources of the divergence. 


The scope of the present paper is not the complete investigation of all the data 
of the Royal Society Commission, nor an endeavour to obtain from the published 
data the full conclusions which may be legitimately drawn from them. Its purpose 
is to illustrate the statistical methods which ought to be applied to such material 
and to indicate the essential necessity of control experiments on strains known to 
be the same or accepted as different. A point should be noted here, namely, that 
I have only found two cases where the strains on the basis of the statistical 
evidence are said to be different. The first is in the case of Trypanosoma evansi 
and Trypanosoma brucei. Sir David Bruce* gives (1911) the frequency distri- 
bution of lengths of 820 individuals of 7. evansi and compares it by means of a 
graph of percentages with 7. brucei. The percentages of the latter appear to be 
deduced from the lengths for two series of 160 trypanosomes and 200 trypanosomes 
cultivated in a variety of animals (Uganda, 1909, and Zululand, 1894) and pub- 
lished in the preceding yeart, but no reference is given in the paper to the 
original of the percentages in the graph, nor is any demonstration given in the 
paper of 1910 of the statistical sameness of the Uganda and Zululand strains— 
there is merely said to be “ marked resemblance{” where the trained statistician 
finds marked divergence§. Stephens and Fantham|| use the curve of 1911 to 
assert that there is a “general resemblance between the curves representing 
the measurements of these trypanosomes (7. gambiense, T. rhodesiense, T. brucet)” 
and consider that this “general resemblance” shows that “the method is a 
trustworthy one.” It is not clear what “the method” referred to really sig- 
nifies. The statistical comparison of means and maximum and minimum 
lengths without statement of probable errors, and the mere graphical exami- 
nation of frequency curves are wholly inadequate to determine sameness or 


R. S. Proc. Vol. 84, B, p. 186, 1911. 
|+ R.S. Proc. Vol. 83, B, pp. 5 and 11, 1910. 
R 


| 


+ R. S. Proc. Vol. 83, B, p. 12. 
§ The two distributions are as follows: 
oa | 5 fe 
|} 13) 14|}15| 16) 17) 18 | 19) 20 | 21) 22) 23) 24| 25 | 26 | 27 | 28 | 29 | 50 | 31 | 82 | 83 | 34 | 85 | Totals 
(hers jth, elie lice ee i 
Uganda, 1969 |—|—] 1, 2} 4} 6/10/26 14/14/12} 9]12) 6} 6/12/10) 7] 1) 2/3) 3)—/ 160 

| Zululand, eee a 3} | 11 | 11 | 20 | 32 ma 4} 4) 3] 5/3] 7] 7 |10/13/13] 8 |10} 8} 3]1] 3] 200 
| | | 


These give x2=101:18, leading to P<-000,001, or not once in a million trials would two so divergent 
distributions be obtained by sampling the same population. 
|| R. S. Proc. Vol. 85, B, p. 233, 1912, 
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divergence. Only a month later than Stephen and Fantham’s paper appeared 
another paper by Sir David Bruce and others* comparing human trypanosomes 
from Nyasaland with 7. brucei and T. rhodesiense and T. gambiense. But the 
curve for 7. brucei is wholly different from that of a year earlier. Instead of a 
minimum at 24 microns there is now a maximum at 24 microns, and the “ general 
resemblance” of 7. brucei to 7’. evanst is much increased. We are now told that 
T. rhodesiense (Stephens and Fantham) is “a distinct species, nearly related to 
T. brucei and T. gambiense,” and the conclusion drawn that “the human trypano- 
some disease of North-east Rhodesia and Nyasaland is not the disease known as 
Sleeping Sickness in Uganda and the West Coast of Africat.” But the divergence 
between the frequency distributions of 7. brucei and the human trypanosome of 
Nyasaland when accurately measured is of exactly the same order as that which 
suffices to demonstrate the identity of the human Nyasaland trypanosome and 
T. rhodesiense. Thus the two cases in which divergence is asserted, i.e. (i) 7. brucez 
and 7. evanst, (11) T. brucei and 7. rhodesiense, seem to be differentiated largely on 
the base of unanalysed statistical evidence of a nature precisely like that which in 
other cases is interpreted to mean close “general resemblance” or “sameness.” 
We do not feel that we are in the possession of independent evidence of differen- 
tiation which would enable us to test how far statistical divergency corresponds to 
recognised morphological differences of strain,—a fundamental requisite if we are 
to interpret as “sameness ” a statistical divergence of an extremely high order. 


In concluding these introductory remarks we must refer to the types of 
trypanosome in Nyasaland recognised by Sir David Bruce and his colleagues as 
distinct on other grounds than numerical measurements. They are: 


(a) TZ. brucei vel rhodesiense. This is said to be the cause of the human 
trypanosome disease of Nyasaland. The modal length appears to be 24 to 25 
micronst. According to Bruce and colleagues 7’. gambiense appears to have a 
mode of 20 microns, but there is evidence for a submode at 26. 


(i) TZ. pecorum. This is said to be the cause of trypanosome diseases of 
domestic animals in both Uganda and Nyasaland. The modal length varies from 
13 to 14§. There is no statistical evidence of bimodality. 


(ii) 2. stmiae. This attacks monkey, goat and warthog. Oxen, dogs, white 
rats, etc., are said to be immune. The length distribution appears to be very 
homogeneous and with a single mode at 18 microns]}. 


* R.S. Proc. Vol. 85, B, p. 431, 1912. 

+ R. S. Proc. Vol. 85, B, p. 433, 1912. In 1913, however, we find that ‘there is some reason for the 
belief that T. rhodesiense and T. brucei are one and the same species,” see Sir David Bruce and others, 
R, S. Proc. Vol. 86, B, p. 407. 

+ R. 8. Proc. Vol. 84, B, p. 331. Stephens and Fantham’s measurements on 7’. rhodesiense 
suggest modes at 20 and 26. Ibid. Vol. 85, B, p. 231. The double mode—roughly 18 to 20 and 
28 to 29—appears in the Zululand (1894) and Uganda (1909) strains of T. brucei. Ibid. Vol. 83, 
B, p. 12. 

§ R. S. Proc. Vol. 82, B, p. 468, and Vol. 87, B, p. 14. 

|| R. S. Proc, Vol. 85, B, p. 477, and Vol. 87, B, p. 48. 


92 A Study of Trypanosome Strains 


Gv) ZL. caprae. This is found in waterbuck, ox, goat and sheep. The dis- 
tribution of length is apparently homogeneous and the mode at 25 microns*. 

I leave out of account several forms of trypanosome referred to by Sir David 
Bruce and colleagues, e.g. 7. vwaa, T. uniforme, T. ingens, etc., of which no large 
series of measurements were at my disposal. 


With the exception of 7. simiae, which occurs in the warthog, the above 
trypanosomes appear to be found generally in the wild game and all of them are 
found in the Glossina morsitans. Sir David Bruce and his colleagues suppose the 
differentiation into these classes to precede the consideration of individual strain, 
but the exact modus differentrationis is not clear from the memoirs. 


(3) Method of Investigation. The actual formula employed in the present 
investigation is very simple and can be applied by anyone able to do ordinary 
arithmetic. If N and N’ be the sizes of two samples and the corresponding 


frequencies : 
Joy Jas ise aise Tas eae Ss: 
as Us ws; wise) VEN ile. fess 


where fp, f, are the frequencies falling in the p™ category, then if 


(Io _ tv \? 
cou CEB 
| et Te, 
be calculated, the probability P that the observed or a greater divergence between 
the two series would arise from sampling the same population is obtained by 
determining P from y? by my method of testing “goodness of fit.” This method 
was first published in the Phil. Mag., Vol. 50, p. 157, 1900. The shortest method 
of actually determining P is by aid of Palin Elderton’s tables for P with argument 
x’ issued in Biometrika, Vol. 1. p. 155, 1902. This is the process used in the 
measurements of sameness and divergence provided below. 


(4) On the Probability of the identity of the Strains discussed by Sir David 
Bruce and others. 


(a) I take first the question of the “sameness” of the Wild-game strains of 
trypanosomes as isolated from five antelopes—reedbuck, waterbuck, oribi, and two 
hartebeeste. Sir David Bruce and others discuss these strains in a papert of 
February, 1912, and conclude, apparently from the statistical data, that ‘the five 
Wild-game strains resemble each other closely and all belong to the same species.” 


Now these Wild-game strains have a distinct advantage for they are all 
obtained from the trypanosomes ultimately taken from the rat as host; they were 
passed from the infected antelope through healthy goat, monkey or dog, which 

* R. S. Proc. Vol. 86, B, p. 278. 


+ R. S. Proc. Vol. 86, B, p. 407, 1913. In the Table p. 405 for 2500 trypanosomes under the 
heading 31 microns read a frequency of 33 not 53. 
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became infected, to the rat. The frequencies of lengths of the trypanosomes in 
microns were as follows : 


From Rat 15|16|17|18|19| 20 


21 | 22| 23 | 24\ 25 | 26 | 27 | 28 | 29 | 30 | 31| 382 \ 33 | 384| 85 


Hartebeeste (1) ... 
Hartebeeste (2) ... 
Oribi 2 ee 
Waterbuck 
Reedbuck 


Mzimba (Donkey)| 
Strain i 


I questioned first whether the strains found in the two Hartebeeste were the 


same; they give 
x? = 108'69, and therefore P < ‘000,000,1. 


In other words not once in 10,000,000 trials would two such divergent samples 
arise if the Hartebeeste strains were samples of the same population. I now 
compare the Waterbuck and the Oribi; these provide y? = 109°25 and P <-000,000,1, 
and again the ewtraordinary divergence, not the sameness, is the statistical 
feature. The reader may rest assured that equally incompatible results arise 
when we compare the other antelopes. Statistically we are compelled to assert 
either that the trypanosome strains in these different antelopes were different 
species, or that, not only the infected species of antelope, but the individual 
antelope of the same species (as in the case of the two Hartebeeste) immensely 
modifies the strain of trypanosome. In short not the “sameness” of the strains, 
but their great statistical divergence is the fact which impresses itself on the 
biometrician. No biometrician could possibly accept the view of Sir David Bruce 
and his colleagues that* : 


“Tt is evident from these tables and charts that the various strains of. this 
trypanosome, as they occur in wild game are remarkably alike. This is what 
might be expected. Here the trypanosome is at home; it is leading a natural 
life. It may be supposed to be saved from variation by constantly passing and 
repassing between the antelope and the tsetse fly.” 


Our authors, it will be noted, directly appeal for “likeness” of strains to the 
tables and charts. 


With these immense measures of statistical differentiation, we ask : what would 
be the values of x? and P, if examples of differentiated strains of trypanosomes could 
be found? If differences of host or treatment can produce these wide divergences, 
how without a preliminary study of the same strain in different hosts and under 
different treatments can we be certain whether these large divergences mean the 
same strain differently treated, or different species of trypanosomes ? 


* R. S. Proc. Vol. 86, B, p. 406. 
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(b) The next comparison I make is between the Mzimba (Donkey) Strain 
taken through rats and the above wild-game strains. I have added the data for 
the Mzimba Strain to the last table (p. 93): it is given by Sir David Bruce and 
others in a paper on the Mzimba Strain*. I compare the Reedbuck and the 
Mzimba (Donkey) strains first. We find: 


xy? = 53°37, P =-000,05. 


Thus only once in 20,000 trials would a divergence as great as this arise, if the 
two strains were samples from the same population. 


The results of comparing the Mzimba strain with Waterbuck and Hartebeeste (1) 
- give respectively 

x= 11423, P= <-000,000,1, 
and x = 00, = = 000000R- 


These give for practical purposes impossibility of a common source, thus still 
further demonstrating that the marked feature of the wild-game and Mzimba 
strains is divergence, not sameness. 


Sir David Bruce and his colleagues writet : “The trypanosome of the Mzimba 
strain is the same species as that occurring in the wild-game inhabiting the 
Proclaimed Area, Nyasaland.” In an earlier paper a diagramt is given of the 
frequency distribution of 3600 trypanosomes of Human strain taken from the rat 
alone. These are drawn from four native cases of sleeping sickness in Nyasaland 
and from one European case from Portuguese East Africa. As the individual 
cases for the rats alone are not given, they have had to be read off the per- 
centage diagram, but the frequencies must be very nearly correct. This Human 
strain may be compared with the 7. rhodesiense, the T. brucei, the Mzimba 
(Donkey) strain and a strain obtained from a native woman suffering from 
“ Kaodzera,” the so-called sleeping sickness of Nyasaland. The frequencies of 
these five strains are given in the following table. I first compare the trypano- 
somes of Nyasaland given as (b) above with 7. bruce: and T. rhodesiense, for this 
is the comparison made by the authors themselves§. 


Taking the trypanosomes of Nyasaland (b) and the 7. brucei as figured in 
percentage curves by Bruce and others, we have 


y2=7217, P< -600,000,1, 


or it is impossible to ascribe any degree of sameness to these two strains. We 
now compare the Nyasaland strain (b) with 7. rhodesiense, and find 


x? = 69:95, P=-000,01 ; 


* R.S. Proc. Vol. 87, B, p. 31, 1913. 

+ R. S. Proc. Vol. 87, B, p. 34. 

+ R. S. Proc. Vol. 86, B, p. 301. 

§ R. S. Proc. Vol. 85, B, pp. 431 and 433, 
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thus once in 100,000 trials two such divergent samples might be drawn. Although 
there is less divergence than in the case of 7. brucei and Nyasaland (5), it is idle 
to speak of such a degree of divergence as sameness. 


Length in Microns. 


LONI | TAN 1S VLGNLT)| LS) | 19 | 20°) 21.) 22) 23 | 2h |.25 | 26 | 27 | 28 | 29 

| Mzimba (Donkey) (4) ~... | —|—|— = | 2}14| 41] 91] 79| 56] 53] 38] 39] 22} 19] 16} 15] 9 
Human, Native Woman (b)/—|—| 1] 4/19) 42) 63] 81] 75| 91) 65; 66; 93] 91) 107/}110/104| 87} 
Human, mixed (c) ...|/—!—}|—]| 1] 4/46] 111] 159 | 219 | 288 | 312 | 365 | 359 | 314 | 314 | 231 | 218 | 198 
T. brucei... eae ..{-—!| 5 | 8/14/17) 40] 63] 55) 66] 63) 75} 87} 93} 80) 82} 72) 50] 38 
| T. rhodesiense >... ...| 1 | 3 |10]19|29/35| 67) 54] 92] 51) 74] 56] 68] 59} 85] 61] 72] 50 

| | 
| a c eee Se el 
Length in Microns.—(continued). 
| 380 | 31 | 82) 33 | 34) 85 | 36:| 37 | 38 | 39 | Totals Remarks 
a — sz 
Mzimba (Donkey) (a) ...| 2] 2/ 2 (=| - 500 | &. S. Proc. Vol. 87, B, p. 31. 
eer? | | Rats only. 
Human, Native Woman (b)| 49} 27/23/13] 7| 1) 1 |—|—|—| 1220 | &. S. Proc. Vol. 85, B, p. 427. | 
| Various hosts. 
Human, mixed (c) ... | 132} 125]}90/59/30/13] 8 | 2 | 2 | —} 3600 | &.S. Proc. Vol. 86, B, p. 301. Read | 
; | from diagram. Rats. 
TT. bruces... oe | 27] 26/18/11} 4) 4; —|—| 2 |-—| 1000 | &. S. Proc. Vol. 84, B, p. 331. 
| | Read from diagram 
T. rhodesiense sen Peamozipe2o (lid. 13) Oat | Lo) — a 1 | 1000 | & S. Proc. Vol: 85, B, p. 227. | 
| Various hosts. 


To further establish our point let us compare the Human strain (c) for 3600 
trypanosomes with the 7’. rhodesiense. Here y? = 325'47 leading to P < ‘000,000,01. 
In other words the great degree of divergence for the case of the Nyasaland native 
woman is exceeded at least a thousand times, when we take the big example of 
four natives and one European. 

Sir David Bruce and his colleagues write of these strains : 

“(1) The trypanosome of the human trypanosome disease of Nyasaland is 
T. rhodesiense (Stephens and Fantham).” In other words the P =-000,01 is inter- 
preted as sameness. 


“(2) This is a distinct species, nearly related to 7. brucei and 7. gambiense, 
but more closely resembling the former than the latter.” In other words they at 
this date distinguished between 7’. brucei and T. rhodesiense*, and as a result of 
this distinction proposed to call the human trypanosome disease of North-east 
Rhodesia and Nyasaland by the name “ Kaodzera” as not being identical with the 
sleeping sickness of Uganda and the West Coast of Africa. If we, however, 
compare 7’. brucer and T. rhodesiense we find y? = 46°83 and P=-019. In other 


* R.S. Proc. Vol. 85, B, p. 433, 1912. 
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words once in about 50 trials we might expect to get two samples from the same 
population as divergent or more divergent than the distributions found for 
T. brucei and T. rhodesiense. We have in fact in the cases of these two trypano- 
somes reached our first instance of comparative sameness, and the statistics should 
have shown Sir David Bruce and his colleagues that 7’. brucei and T. rhodesiense 
were relatively the same, and though both differed from the human trypanosome 
of Nyasaland widely, the approach to 7. rhodesiense was only slightly closer. 


The accordance—speaking in a relative sense—of 7’. rhodesiense and T. brucei 
was asserted by Stephens and Fantham in March, 1912*. In May, 1912, Bruce 
and others, speaking of the 7. rhodesiense, term it a distinct species; in February, 
1918, they say—although without publishing further frequency distributions— 
that “There is some reason for the belief that 7. rhodesiense and T. brucei 
(Plimmer and Bradford) are one and the same species,’ + and in a further paper of 
the same month, “Evidence is accumulating than 7. rhodesiense and T. brucei 
(Plimmer and Bradford) are identical{.” In May, 1913 (R. S. Proc. Vol. 87, B, 
p. 34), we are told that the Mzimba strain is identical with the wild-game strain 
and that “it has already been concluded that this species is 7. brucei vel T. rhode- 
stense.” As far as the statistics of the subject go the only really weighty evidence 
for the identity is that of 1912, on which, without statistical analysis, the 
distinction between the two species was asserted. 


(c) We will next consider the possible identification of 7. gambiense with 
T. rhodesiense and with T. brucev. 


The second identification is seggested by Sir D. Bruce and others in the words§: 


“Whether these slight differences are fundamental or only accidental it is 
impossible at present to say, but enough has been written to show that Trypano- 
soma gambiense and Trypanosoma brucet approach each other very closely in 
shape and size.” 


The following table|| provides the data for 7. gambiense to be compared with 
the distribution of 7. rhodesiense ranging from 12 to 39 in the last table. 


Microns. 
ont | | | | | 
T5416 ee | 18| 19 ie 2) | 21 | 22 23 | 24| 25 26 | 27 | 28 | 29| 80 | 31 | 82 | 33 | 34| 35 | 86 | 37 | 88 | 39 | Totals 
| | | | al | 
| | | | | | 
9 | 21. m6 9 114 aie 85 | 61 | 47 a ie Sila Opiellen | Aaa eet = ze - | 1000 
lie =) I 


The Sear adie are ran a ate of hosts. 


For the 28 classes we have, y?= 140°27 and P<-000,000,1. The chief point 
therefore is the complete divergence, not the resemblance of the two series. 


* R. S. Proc. Vol. 85, p. 238, 1912. § R. S. Proc. Vol. 84, B, p. 332. 
+ R. S. Proc. Vol. 86, B, p. 407. || R. S. Proc. Vol. 84, B, p. 330. 
+ R.S. Proc. Vol. 86, B, p. 302. 


KARL PEARSON 97 


Stephens and Fantham, who term their work a “biometric study,” speak of 
“the general resemblance between the curves representing the measurements of 
these three trypanosomes (7. gambiense, T. rhodesiense, T. brucet).’ They con- 
tinue: “We do not consider, however, that identity of measurement would 
necessarily imply identity of species. We still believe that the difference in 
internal morphology, namely the presence of the posterior nucleus, is sufficient to 
separate 7’. rhodesiense both from 7. gambiense and T. brucei*.’ As a matter of 
fact the “ biometric study ” of the data does not indicate identity in the measure- 
ments, but confirms the result of internal morphology by proclaiming wide 
differentiation +. 


(d) We can now compare 7. brucei and T. gambiense. Of these Sir David 
Bruce writes: “Whether these slight differences are fundamental or only acci- 
dental it is impossible at present to say, but enough has been written to show 
that Trypanosoma gambiense and Trypanosoma brucei approach each other very 
closely in size and shapet.” The biometric commentary on this is that for length 
of the two series yx? = 126°52, giving P< ‘000,000,1 and that as far as size is 
concerned the samples ditfer immeasurably, ie. far beyond the limits of the 
calculated tables of P. 


We should thus conclude, merely from the statistical evidence, for close same- 
ness in 7. brucer and T. rhodesiense but for marked divergence of both from 
T. gambiense. 


* R. S. Proc. Vol. 85, B, p. 233. 

+ In a later section of this memoir I show that Stephens and Fantham have been markedly biased 
in their judgment of even and odd units of measurement (p. 129 below), and that the recognition of 
this makes a wide difference in the goodness of fit of my resolution into components to their data for 
T. rhodesiense. It seems desirable therefore to inquire whether this bias affects the test of ‘‘sameness”’ 
of T. rhodesiense with T. gambiense, T. brucei, and the Human strains (b) and (c), see the Tables 
pp. 95—6. The data were accordingly classified into groups of two microns, starting with 12 and 13, 
14 and 15, etc., so as to get rid of the even bias as far as possible, and we find : 


Old Unit Ranges New Two Unit Ranges 


Strains compared 
| on x? Ps n x? 1p) 


| 


TL. rhodesiense and T’. gambiense | 28 140°27 <'000,000,1 14 118°73 | < :000,000,1 


T. rhodesiense and T. brucei ... | 28 46°83 019 | 14 25°76 ‘018 
T. vrhodesiense and Human 
strain (b) a ae reales 28 69°95 000,01 14 45°92 000,06 
T. rhodesiense and Human | 
strain (c) | 28 325°47 < ‘000,000, 01 14 253°37 | < -000,000,01 


The bias towards even numbers of Stephens and Fantham has thus not substantially influenced our 
results, which still show the relative likeness of 7’. rhodesiense and T. brucei, and the marked divergence 
of the former from 7. gambiense and the human strains. 

{ R.S. Proc. Vol. 84, B, p. 332, 
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(e) It seemed well worth while to investigate how far the two Nyasaland 
strains of Human Trypanosomes given in the table on p. 95 agree or differ. The 
first (b) of these strains from a native woman of Nyasaland may be compared with 
(c) a compound strain from four natives and a European. We find 


x? = 172°36 
giving P < :000,000,1. 


In other words, the two Nyasaland strains from human beings are indefinitely 
differentiated. I now compare the Mzimba (Donkey) strain* (a) with human 
strains (b) and (c), we find: 


for (a) and (b) x? = 22316 giving P certainly < :000,000,01 ; 
for (a) and (c) x? = 348°55 . < 000,000,01. 


Thus the trypanosome strain found in the donkey appears to be absolutely 
incomparable with that found in man in Nyasaland, just as the strain found 
in the donkey differed from that found in wild-game. 


(f) We may now turn to a memoirt by Sir David Bruce and others com- 
paring the Mvera cattle strain, the wild-game strain, and the wild Glossina 
morsitans strain. They give on p. 18 of that paper the graphs for 500 specimens 
of T. pecorum, the wild-game strain, and of the wild Glossina morsitans strain taken 
from a variety of hosts. The following are the frequencies: 


Microns. 
_— * aa 
Strain 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | Totals 
== |. a 
Mvera Cattle Strain eee Met bana a 15 | 64 | 101) 186) 114) 59 Seal 500 
Wild-Game Strain ... we | —| — 2) 34 | 85) 172) 119 | 63 | 292) 3 | — 500 
Wild G. morsitans Strain ...| 1 | 4 | 16 | 42 |129) 147/103) 42 | 15} 1 | — 500 


We compare first Mvera cattle strain with the wild-game strain and find for 
our 10 categories 
x7 = 34554, P= 000,243. 


This is a relatively low degree of divergence considering that P has been running 
into 1 in 10,000,000! But it means that if these two strains were samples of one 
and the same population, we should only expect two such divergent samples to 
occur 1 in 4000 trials. 


* This Mzimba strain of trypanosome is discussed in a paper headed: ‘Morphology of the various 
strains of Trypanosome causing Disease in Man in Nyasaland.—The Mzimba Strain’ (R. S. Proc. 
Vol. 87, B, p. 26); it is said to be of the Nagana type and is identified by Sir David Bruce and 
colleagues with 7’. brucei vel rhodesiense, the source of the human trypanosome disease. 

t+ R. S. Proc. Vol. 87, B, p. 4. 
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Next we find for Mvera cattle strain and the wild Glossina morsitans strain, 
x? = 40°508, or P=-000,008, 


or only once in 125,000 trials would a pair of samples so divergent arise when 
testing the same material. 


Lastly, testing the resemblances of wild-game strain and wild G. morsitans 
strain, we find 
x? = 35°41, or P =°000,2, 
not such a gigantic divergency as we have found in many cases, but a difference 
so great that it only occurs once in 5000 trials requires explanation as divergency 
and cannot be used as an argument for “sameness.” 


It will thus be quite clear that as far as the measurements of length go, there 
is wide divergence to be accounted for between the trypanosomes found in the 
cattle, the wild-game and the tsetse fly, and that statistically this divergence is the 
remarkable feature. Yet the conclusion of Sir David Bruce and his colleagues, 
arguing very largely from the frequency distributions, is that “The Mvera cattle 
strain, the wild-game strain and the wild G. morsitans strain belong to the same 


species of trypanosome, 7. pecorum*.” 
d 


It will be seen that actual statistical analysis does not in any way confirm the 
bulk of the conclusions reached by Sir David Bruce and his collaborators. The 
strains may or may not be ultimately of like origin, but what is quite clear from 
the analysis is that, if we are to rely on the measurements, then it is the diver- 
gence, not the sameness of these strains, which should have been emphasised. 
No stronger evidence could be deduced of the danger of appeal to statistics 
when the statistics are not handled by the trained statistician. The mere appeal 
to the resemblance of frequency curves given in the form of percentages, often 
based on widely different totals, is an only too common error of medical investi- 
gations ; it is by no means confined to the Scientific Commission of the Royal 
Society, Nyasaland. But it has recently become so marked a feature of Series B 
of the Proceedings of the Royal Society, that a vigorous protest is really needful. 
Thus in the very last part issued (Vol. 87, B, p. 89) occurs a paper on “ The 
Trypanosomes causing Dourine.” In this paper there may be microscopic evidence 
to differentiate the strains A, B and C dealt with; on that I cannot express an 


* A further conclusion is also reached (Ibid. p. 26) ‘‘7'. pecorum, Nyasaland, is identical with the 
species found and described in Uganda.” Unfortunately the species found in Uganda is dealt with 
in a paper (R. S. Proc. Vol. 82, B, p. 468) which provides no frequency distributions, and does not tell 
us the total number on which the mean length—13°3 microns—is based. The mean value of the 
T. pecorum, Nyasaland is 13-954 (R. S. Proc. Vol. 87, B, p. 3) and the standard deviation is 1-393 in 
microns, thus the probable error of the mean is °67449 x 0623. Assuming the Uganda trypanosome to 
be the same strain and to have the same variability as the 7’. pecorum, Nyasaland, the difference of the 
means = ‘654, with a probable error of *67449 x /2 x 0623 =-67449 x -088, thus the deviation of the means 
is 7:73 times its standard deviation. A deviation so great would only occur about once in 4 x 10" trials, 
i.e., would be practically impossible if the two strains were identical. Here again it is excessive 
divergence not sameness which the statistics indicate. 


13—2 
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opinion. But on pp. 92—3 percentage frequency curves are drawn for the three 
strains, and the following remark is made : 


“A survey of the curves obtained by plotting out in percentages the various 
lengths of trypanosomes encountered in each of the three strains is of interest. 
It will be observed that in the case of rats the curves of each of the strains corre- 
spond fairly closely.” 

Now what do the authors mean by “fairly closely”? In their conclusions 
they identify B and C and differentiate A. Unfortunately they have not given 


their actual frequencies, and I have had to endeavour to reconstruct them from 
the percentage curves. There results for the rat-data: 


Microns. 

| i eel ier ae | | 
| 16|17|18| 19 | 20| 21 | 22| 23\ 24| 25 | 26 | 27] 28| 29 | 30| 31 | 82| 83 | 84| 85 | 36 | Totals 

| | | | | | | 
oo | | | | | cae | | | z 
| [oe es te | ae ea | | 
Berlin Strain A . | 1 | 1 {10] 9 |12/17|17)22|28]48 |47 |57/55|42 |39!37|/28/13| 81 6 | 3 | 500 
Frankfurt Strain B...]—|—| 1] 3 | 5) 1] 4/10] 20/29*|18* | 25 | 24 | 35* | 23} 15)18}15] 8 | 3 
East Prussian Strain C}—-| 1 | 4 3 | 6|12)15)22|24)27 | 28-|37|31])16 |10; 7) 5; 2);-—|— 
ee 7 . : | | | | | | 


We obtain the following results: 
Strains A and Bs) ¥7=3111, P= 0627, 
Strains A and C: y?=43°37, P= -0034, 
Strains B and @: 4°=72:72, P=<:000,001 


Thus to judge from rats only, Bb and C are far more divergent from each 
other than either is from A; in other words the strain A is intermediate between 
B and C and closer to 5, from which it is not immensely divergent; two such 
samples as A and B might, as far as the length distributions go, be drawn from 
common material once in 16 trials. 


Now of course no one suggests that a conclusion drawn from this rat-material 
is to replace one drawn from guinea-pig material, but the statistician cannot agree 
that for rats “the strains correspond very closely”; and he finds it illogical to place 
the evidence of the rat-data on one side and proceed to draw conclusions from the 
ocular inspection of the guinea-pig curves, without noticing that the conclusion is 
markedly opposed to the proper deduction from rat-data. Indeed while the guinea- 
pig-datat give a relatively high degree of relationship between B and C (P =:0157) 
it is not as high as the rats give between A and B (P=-0627); and while the 


* The values given by the percentage graphs in these cases are respectively 21, 17 and 34, and 
the total appears to be 247 and not 250 as stated. Hither 247 were used or the graph is in error. 
The three individuals were introduced in a way calculated not to increase divergence. 

+ The frequency distributions for the guinea-pigs have had to be reconstructed from the percentage 
curves, the necessary data not being published by the authors. 
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relationships of A and B (P< -000,000,1) and A and C (P< 000,001) are very 
low, the origin of the second hump in the guinea-pig distribution for A requires 
much more analysis and the certainty by control experiments, that it always 
repeats itself, and is not the result of hitting a “ pocket.” 


¢ 


It seems to me that any statistical analysis by modern methods of the trypano- 
some data compels us to confess that either statistical methods must be discarded 
entirely in these trypanosome investigations, or they must be pushed to their 
logical conclusion, and used as the fundamental instrument of research which can 
guide our enquiries by inference and suggestion when, and when only, it is handled 
by the trained craftsman. Thus far the use made of statistical methods seems 
merely to have confused the issues, and brave would be the man who would venture 
to say after reading this section of our present paper that any two strains discussed 
by the commission are definitely “same” or certainly differentiated. 


(5) On the Probability that the Animal in which the Trypanosome vs cultivated 
makes essential Differences in the Distributions of Frequency. 


But the very method which casts apparent discredit on the results at present 
reached seems able to lead us to definite conclusions provided we start with it as 
the fundamental mode of investigation. Really very little inspection seems to indi- 
cate that not only the host but the period of infection materially influences the 
frequency distribution. These points have not been wholly disregarded by the in- 
vestigators in this field, but they have had no quantitative measure by which they 
could appreciate the relative influence of the various environmental factors. Nor 
indeed could the method be fully applied without experimental observations on 
trypanosomes of the same strain subjected to differential treatment. Knowing in 
such cases the quantitative divergence produced, we should be in a position to infer 
whether two strains from different sources were separate species or merely modified 
by differential environment. Until we have such quantitative measure no hypothesis 
of sameness or difference can flow from statistical treatment; nobody as yet knows 
how much to attribute to environment, how much to attribute to individuality 
of strain. 


In endeavouring to throw light on this matter we are, however, checked at 
the very start by the absence of effective material. In some cases the period of 
infectivity is not given; in others we are not always able to break up the total 
frequency by reference to the host, or to a single host. And even when we merely 
classify by one type of animal as host, we may have reduced our material to such 
small numbers that samples may be “same,” which on larger numbers would 
show the marked divergence due to the emphasis of smaller differences*. Some 
suggestive points can, however, be effectively dealt with and they are treated in 
the following paragraphs. 


* It may not be possible to differentiate Bavarian from Wiirtemberger on samples of 50 crania, 
although quite possible on samples of 400. 
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(a) I ask what difference is made when a strain is passed through various 
animals (goat, monkey, dog, rat) or through a single animal alone. Taking the 


wild-game strain discussed by Sir David Bruce and others*, we have: 


Microns. 


10 | 11| 12 | 18 | 14 | 15 | 16 | 17 | 28 | Totals 


Wild-Game Strain 

(from various ae 
Wild-Game Strain 

(from a single rat 510) 


Here we find y? = 65°37 and P < ‘000,000,1. In other words the distribution of 
lengths of the trypanosomes of the wild-game strain obtained from various animals 
differs so enormously from that obtained from a single rat that the two cannot be 
looked upon as samples of the same population. The moment this result is realised 
we appreciate that (1) it is impossible to compare two strains developed in a variety 
of animals unless we have previously tested on the same strain the equal valency 
of these animals, (11) a series of animals of even the same species may quite 
possibly give widely divergent results from those obtained for a single animal. 
Thus passing from a variety of animals in wild-game strain to a variety in wild 
G. morsitans strain makes less difference (P = ‘000,008)—although great enough— 
than passing from a variety of hosts to a single rat in the wild-game strain. 
This rule is not universal, but it illustrates the absolutely essential need for 
testing the effect of change of host before questioning the identity or non-identity 
of two strains. 


(b) I now turn to the Mvera cattle strain, and ask what differentiation is 
produced by the dog and goat as hosts. The data are very sparse and unless we 
get a high degree of resemblance may be worth little. They run+: 


* R. S. Proc. Vol. 87, B, pp. 6 and 8. 
+ R. S. Proc. Vol. 87, B, p. 3. I tested the relative interchangeability of goat and sheep in the case 
of T. caprae. The data are as follows: (R. S. Proc. Vol. 86, B, p. 280) 


Microns. 
eee ie | | Wi tiee pee a | 
T. caprae | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 82 | Totals 
—— | | 
Goat .. |—|—] 8 | 7 | 11] 85 | 43 | 50 | 88 | 28) 27; 17) 5 | 1 | —} 260 
Sheep... | — | — J 1 | 10 | 12 | 29 | 39 | 31 | 28 | 20 5 | 3 id 1 180 | 


leading to y?=18-088 and P=-1133 or the resemblance is considerable although not so great as we find 
between goat and dog for the Mvera cattle strain. 
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Microns. 
| > | ] 
| DON Le | 12" TS tp \\ 15 | 16 | 17 | Totals 
| 
fs | | | 
Mvera Cattle, Goat ... | 1 Wed ala 225 26°) 19 is.) i 100 
” ” Dog | 


a eon ot lala ag ZI 8, | =— | 100 


We have x? = 5°396 leading to P= "714, or in 71 pairs of samples out of 100 
from a homogeneous population, we should get more divergent results. It follows 
therefore that, as far as these small series of this strain go, goat and dog are 
interchangeable as _ hosts. 


Let us go a stage further and ask whether ox is interchangeable with goat and 
dog. The following is the frequency distribution for the trypanosomes through 
the ox: 


Microns. 
9 10 ae ei 13 || Lo |G NLT | is Totals | 
, a\2 _| 
| | 
Mvera Cattle, Ox ... | — | A Pis | 33 | 44 | 49|91| 7 | 1 | 180 | 
| 


Compared with the goat strain, this gives 
‘x? = 9559 and P =°3888, 
and compared with the dog strain 
xy? =9:461 and P= ‘3973. 


Thus in about two out of five trials from a same population we should get 
pairs of samples differing more than the dog and goat strains do from the ox 
strain. We conclude that while for practical purposes dog, goat and ox strains in 
the Mvera cattle trypanosomes are interchangeable, yet the dog and goat strain 
are nearly twice as much alike as the ox strain is to either. Lastly—although it 
is rather a rash proceeding—I compare rat with goat and dog. It is rash because 
only 40 trypanosomes through the rat were measured, and this is wholly inadequate 
for real determination. The frequencies for the lengths are: 


ae oe ee ee ae 
| | 
| 9 | LO tL | 12) 18 14 | 15 | 16 | 17 | Totals 
| | 
—| | are 
Mvera Cattle, Rat ... sas fe NN I | alee oy 40 
; »  Dogand Goat | 1 | 1 | 6 | 25 | 49 | 56 | 40) 21) 1 | 200 


We find y?=21'329 and P=-0064. The small series of rat trypanosomes 
probably accounts for no smaller value of P, but the odds of 155 to 1 are 
sufficient to show that rat series must not be mixed with series from the goat, 
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dog or ox. This confirms the view obtained for the wild-game strain, that a 
strain taken through the rat as host is incomparable with strains from other 
animals. 


(c) The totals considered for one species of host in (a) and (6) are rather 
small. Larger numbers are forthcoming for the so-called Mzimba strain of 
trypanosomes taken from a donkey at Mzimba. The frequencies are here*: 


| ” ” 


Microns. 
16 | 17 18/19 | 20 | 21 | 22| 23| 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | Totals 
| ; | | } | | | | le 4 | | | 
Re cal as 
Mzimba Strain, pos | 3 17 | 56 69 | 67 | 47 | 27 | 22 | 10/12] 7 4| 4 | 4|2)1 | 360 
e | 2 i 41 91 79 56) 53 | 38/39 22/19/16/15 | 9 | 2/2/21) 500 


este 

We find y? = 25499 and P=-0619. Thus only about once in 16 trials should 
we get such a degree of divergence as the two samples present, drawing them from 
the same population. This is very far from such a divergence as we have noted 
in the rat and dog for the Mvera cattle strain, or in the case of rat against other _ 
animals in the wild-game strain, which was extremely large. The only expla- 
nations that occur to me here are: 


(i) In the case of the wild-game strain and the Mvera cattle strain a single 
rat seems to have provided all the trypanosomes, while in the case of the Mzimba 
strain two rats were used; this might lessen the influence of individuality. 


(ii) In the case of the Mvera cattle strain and the wild-game strain the 
trypanosomes were ultimately taken from a great number of individuals. In the 
Mvera cattle case we are told that 32°/, of the herd were affected, and we have 
some details of 16 head of cattle and 5 donkeys naturally infected+. In the wild- 
game case, the wild game affected were very numerous, covering cases of eland, 
reedbuck, waterbuck, bushbuck, oribi, koodoo, hartebeeste, buffalo and hyaena. 
Now can we start with the hypothesis that all the individual cattle and all the 
individual wild game were each bitten by a fly carrying the same strain of 
trypanosome? Have we any more right to suppose @ priort that one wild- 
game strain of trypanosome and one cattle strain of trypanosome exist, and ask 
whether these two are identical, than to ask whether the strains carried by hyaena 
and hartebeeste are the same? We have already (p. 93) seen that the strains 
from two hartebeeste are extremely divergent. What right have we @ priort 
to classify all wild-game trypanosomes together and call them a wild-game strain ? 
And if two antelopes, whether of the same or of different species, give widely 
different results, why are the trypanosomes of oxen of the same herd or donkeys 
and oxen from the same neighbourhood to be classed @ priort as of one species ? 


* R. S. Proc. Vol. 87, B, p. 31. 
+ R. S. Proc. Vol. 87, B, p. 15. 
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If we turn to the Mvera cattle, we find there were four sources of trypanosomes 
for the ox, two for the goat, and the same two for the dog—these two sources being 
two of the four cattle sources. There was only one source for the rat, but I have 
not discovered how far it was identical with one of those for ox or goat*. In the 
Mzimba donkey strain there was one source for dog and rat. In the wild-game 
strain there were, I make out, ecght sources of trypanosomes for the goat, four for 
the dog, and only one for the rat. 


Thus the individuality, which might be supposed to influence the result, 
because we are treating of trypanosomes in this case from a single rat, in the 
Mvera cattle case from a single rat, and in the Mzimba data from only two rats, 
may really arise from the fact that the rat strains in each case are derived from a 
single source, while the dog, goat and ox strains show a multiplicity of sources. 
The troublesome point is that the experimental part of the work has not been 
designed to answer what seem to me fundamental questions. We cannot directly 
inquire what difference the host makes because different hosts have rarely been 
treated with the strain from a unique source. We can say that dog and goat are 
interchangeable for the Mvera cattle strain, because both drew trypanosomes from 
the same two sources ; but we cannot determine whether the difference in the ox 
is due to difference of the host, or to the introduction of two more sources. Simi- 
larly the divergence between the trypanosomes from rat and from other animals for 
the wild-game strain may be due to using one rat and therefore one source, and not 
the many sources of the other animals, or it may really be due to the differentiation 
of the host. In the same way the difference between the two hartebeeste may be 
due to individuality in the same species, or to infection from different strains. 


(d) To some slight extent we may appreciate the effect of individuality by 
comparing the two rats 512 and 513 im the case of the single source, the Mzimba 
strain f. 


The frequencies are as follows : 
Microns. 


| | Neal ee lee | 
| Mzimba Strain | 16 | 17 | 18 | 19| 20 | 21 | 22 | 23 | 24 | 25 | 26 
| | | | 


| 
| 
| 
| 


7/1/1/2] 240 
2/1 He 260 | 


Rate? °... |—| 5 
(Rat 513. 2) 9 


{ 


The numbers are not as large as we should like; but they give 


y2=17'89, P ="3306. 
* R.S. Proc. Vol. 87, B, pp. 2 and 15. 
+ R. S. Proc. Vol. 87, B, pp. 6 and 8 compared with 5. Rat from p. 8. 
t R.S. Proc. Vol. 87, B, pp. 29 and 31. 
x 
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Clearly then two samples as divergent as those found wonld occur on the 
average once in three trials. It follows that two individual rats are really inter- 
changeable and we note that the extent to which ox is interchangeable with dog 
or goat for the cattle strain is very much the degree in which two rats are inter- 
changeable. To judge from this single instance, individuality within the same 
species of host is not very important, and when we find two hartbeeste differing 
as those considered on p. 93, it seems much more likely, with the information we 
have at present got, that the hartebeeste were infected with different strains of 
trypanosome than that their individuality produced the enormous divergence 
noted. Again the sensible divergence between Mzimba strain in dog and rat on 
p. 104 is probably due to difference of host, but the enormous difference in the 
wild-game strain between a single rat and dog and goat on p. 103 is probably due 
to differences in the strains of trypanosomes in the various types of wild game 
dealt with. We may consider whether the dog and goat data for the wild-game 
strain differ sensibly. We have* 


Microns. 
i oe : : 
| | 11 | 12 | 18 | 14 | 15 | 16 | 17 | 18 | Totals 
+ : 
| Wild-Game Strain, Goat ... | 1 | 16] 37 | 73 | 38) 26| 8 | 1 | 200 | 
| 3s = 5 Dogs... | — | 12) 31))957 | 50 | 24 | 6. | ==s\eas@ | 
| eal a 


Here y?= 6:04 and P ='5378. Thus in more than half the trials we should 
obtain from homogeneous material pairs of samples more divergent than those for 
dog and goat. This confirms the view formerly expressed that as far as trypano- 
somes are concerned dog and goat are interchangeable. We cannot yet say that 
they are not interchangeable with the rat, as the mixture of strains in dog and 
goat and the uniqueness of strain in the rat may account for the marked 
divergence of the latter. Sir David Bruce and his colleagues do not appear to 
have noticed the wide divergence of the distribution of the rat from the dog and 
goat either as indicating the heterogeneity of the wild-game and the cattle strains 
of trypanosomes, or as suggesting such wide differentiation of strain by the host, that 
rat-material cannot be mixed with that from dog and goat. They do, however, 
remark of the wild-game strain: “In this the rat is not a suitable animal, since 
many strains of 7’. pecorwm have no effect on it}.” This suggests that 7. pecorum 
is not homogeneous and that the rat exercises a selective influence on its strains. 
The suggested rejection of the rat data seems, however, to be based upon the in- 
convenience of its non-infectivity, and not on what might turn out to be of great 
importance a selective influence on wild-game or cattle strains. It is not possible 
to test this selective power in the present mstance, as we do not actually know 
how heterogeneous either the cattle or wild-game material used really was. 


* R.S. Proc. Vol. 87, B, p. 7. 
+ R. S. Proc. Vol. 87, B, p. 7. 
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(e) If we turn to the 7. pecorum strain as actually found in the tsetse fly, we 
see that Sir David Bruce and his colleagues deal with these trypanosomes passed 
through a variety of animals, of which only goat and dog supply sufficient numbers 
for any even approximately accurate treatment. The data are as follows*: 


Microns. 


beg. zo") 27 es 14| 15 | 16 | 17 | 18 | Totals 


= rales ; | | 


Wild G. morsitans strain: Goat | 1 | 3 | 12 | 21 | 55 | 60 | 32 | 12) 4 | —} 200 


| 

. ¥¢ ” Doe |=") == 13) 144) 34. 41, | 40 | 19:1 9 | — | 60 
i | Ea 

Wild G. morsitans strain: Rat | — | 1 | — | 3) 22 | 28/19] 6 | 1 | — 80 


For goat and dog we find y?= 19°518, which give P=-0125. The resemblance 
is therefore far less than we have found for goat and dog in other strains, only 
once in 80 trials from homogeneous material would two samples of such divergent 
character arise. Before we comment on this it seems desirable to compare the 
very inadequate rat data. 


For rat and goat we have 

x? = 12201, P='1434. 
For rat and dog we have 

x?=11:370, P =-1245, 


Accordingly we see that for this material the rat strain (i) lies between the 
dog and goat strains, and (11) is definitely interchangeable with dog and with 
goat, while the dog and goat are much more divergent. Now the sparsity here of 
all the data must prevent any dogmatism; all we can reach is suggestion for 
further investigation. But the following points should be notedt. The trypano- 
somes through the goats were obtained from sva different goats, infected directly 
from the wild fly; the trypanosomes from the dogs were obtained from only four 
different sources, namely from a monkey directly infected by the wild fly, from a 
dog directly infected, and from two goats (89 and 125), the former only of which 
is identical with one of the former six goat sources. Lastly, the rats were infected 
from one dog alone, upon which the tsetse flies had directly fed. This dog is not 
identical with one of the dog sources. Now unless we assume that all the strains 
of the trypanosome found in the tsetse fly are identical—which is certainly not in 
accordance with the differences found in the strains of wild game from the “ fly- 
country ’—it is by no means certain that the trypanosomes obtained from wild 
G. morsitans, through goat, dog and rat as above noted came from anything like 
the same sources. Further, the closer resemblance between rat and dog strains 


* R. S. Proc. Vol. 87, B, p. 11. 


+ R. S. Proc, Vol. 87, B, pp: 10, 11, and 19 to 22, 
14—2 


108 A Study of Trypanosome Strains 


may simply be the result of the rat strain having been developed in the dog as 
host. The divergence between the dog and goat strain may again be solely due to 
the greater variety of sources in the goat. The data from the wild G. morsitans 
experiments seem to indicate that the observed divergences between the strain 
from rat and the strain from goat or dog may not be due to difference of host; 
but to difference of source from which the material was drawn, and to difference of 
treatment of the individual stock of trypanosomes, e.g. the number of hosts, ete., 
through which it has passed. 


It seems absolutely certain that at the present time most light would be 
thrown on the conditions for asserting sameness or diversity of strains, by well 
devised experiments on strains from single sources passed through different species 
of hosts in different manners, in order to determine the exact measure of divergence 
produced by host and by treatment, and ultimately to devise a standard treatment 
for all strains which we desire to compare. 


The exact nature not only of host, but of standard treatment is most vital. We 
can demonstrate the influence of treatment at once by considering the “ percentages 
of posterior nuclear forms among short and stumpy forms” recorded by Sir David 
Bruce and his colleagues for the wild-game strain*. All the trypanosomes were 
from rats, and although the date of infection of the rat is, I think, not stated, the 
dates of first extraction will be after much the same interval, and we can therefore 
classify by date from first extraction. We find the following table: 


Wild-Game Strains. 


Percentage of Posterior-Nuclear Forms among 
Short and Stumpy Forms. 


From first Extraction 21°/, and under 22°/, and over | Totals | 
| 


6 days and under 18 6 24 
| 7 days and over 6 18 24 
| Totals 48 


Using Sheppard’s formula for the four-fold table, we have for tetrachoric r 
Oe 


or, the correlation between this character of the trypanosome and the time 
after infection of extraction is very considerable. It will be obvious that in a 
standardised treatment this time of extraction will play a most important part. 
But it again is not independent of the species of trypanosome, for if we take the 
wild Glossina morsitans strainst, we find : 


* R.S. Proc. Vol. 86, pp. 396—404, Tables III, VI, IX, XII and XV. 

+ R. S. Proc. Vol. 86, B, pp. 410—418, Tables III, VI, IX, XII and XV. I have added one percentage 
by random selection from the complete table by lot in order to give 60 cases, and save labour in 
fractionising. 
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Percentage of Posterior-Nuclear Forms among 
Short and Stumpy Forms. 


From first Extraction 7 °/, and under 8 °/, and over Totals 


6 days and under 18 30, | 
12 


7 days and over 


Totals 30 


leading to r = — ‘309. 

In other words using tsetse fly strains and not wild-game strains, but the same 
host, we find that now the correlation is negative or the longer the infection the 
smaller the percentage. Actually the five G. morsitans strains show remarkably 
irregular results compared with the results for the wild-game strains; the ex- 
tractions were spread over much the same period, 13 to 14 days on the average, 
but were somewhat more numerous for the G. morsitans. Thus even the same 
method of extraction may give widely varying results according to the nature 
of the strain producing the infection, although the host be the same. 


To the statistician who examines the frequency distributions provided by 
Sir David Bruce and his colleagues for both wild-game strains and Glossina 
morsitans strains, there can hardly remain a doubt about the heterogeneity of 
the material in each case. We have already demonstrated this statistically for 
the wild-game strains. These strains not only differ by immense differences 
inter se, but intra se they are clearly heterogeneous. Whether this heterogeneity 
is due to the mixture of separate strains, to dimorphism within the strain, or to 
the combination of material drawn from the rat at various stages of infection, it is 
not possible on the material at present available to determine finally. The same 
remarks apply with even greater certitude to the wild G. morsitans strains than to 
the wild-game strains. But we shall return to this point in the last section of this 
paper. We have already noted that Sir David Bruce and his colleagues identify— 
against the weight of the statistical evidence—the Mvera cattle strain, the wild-game 
strain and the wild G. morsitans strain as belonging to the same species 7’. pecorum*. 
They had previously identified other strains in wild game, G. morsitans and human 
beings} with 7. rhodesiense which they elsewhere describe as vel brucet{. This is 
again, I hold, against the weight of statistical evidence. But it is not clear from 
the memoirs themselves what is the exact process by which an individual fly, an 
individual human being, or the blood from a specimen of wild game is credited 
with carrying a homogeneous strain. The sizes are so different in the cases of 
T. pecorum and T. simiae that there may be no difficulty in distinction, but the 
range is so great and to the statistician the material seems so heterogeneous in the 
ease of T. brucei vel rhodesiense that, perhaps, a fuller description by the authors 

* R. S. Proc. Vol. 87, B, p. 26. 


+ R. S. Proc. Vol. 86, B, p. 42. 
+ R. 8. Proc. Vol. 86, B, p. 426. 
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of the process of differentiation would aid him. This is of especial importance 
if it should turn out, as I suspect, that the trypanosomes classed as T. brucei are 
either dimorphic, or belong to two different species. 


In another paper* we find the trypanosomes from G. morsitans, on the basis of 
their infective powers on monkey, goat and dog, resolved into 7. brucei vel rhode- 
stense, T. pecorum, T. simiae and T. caprae. But it is clear that the differentiation 
was not done solely by infectivity, or there would have been no means of dis- 
tinguishing 7. bruce: and T. pecorwm which attack all three—monkey, dog and 
goat. The question arises, whether 7. pecorum, T. simiae and T. caprae being 
readily identified by microscopic examination or size, the remainder was classed as 
T. brucei, in which case the question of the heterogeneity of this group, which 
appears to attack all animals, is rather supported than otherwise by this paper. 


Frequencies of the Various Strains for Length. 
Length in Microns. 


| -.| | | | | | 
Strain 9 |10)11| 12 | 28 °\ Th \eioy| 6 ety 78 | 19 20 21 | 22 | 28 | 24 | 26 | 26 
| Ae aaa . 
T. pecorum 2 | 6 | 42/193 | 452] 618/ 453/178] 51] 5 mee 
T. simiae —{|—|—| — | — | 7) 28) 76) 93) 126) “92)/0 47i\" 221) eG io) ae ee 
T. caprae —;/—-—|;-|]—- 1}— | 3] 8) 28) 49) 79.) 95) 80 
| 
(i) 7. rhodesiense —i—|—]1 3 10} 19] 29] 35! 67 | 54) 92 | 51| 74}> 56! 68 59| 85 
Gal Ze bruce, 62-9) — ao 8| 14] 17} 40] 63) 55 66 63)| 75) 87) 93] 80] 82 
(ii) 7. gambiense... | —|—|—]— |} 1 | — 9| 21| 56| 79|114/122 110] 85] 85] 61] 47] 49 
(iv) Mzimba Strain | 8| 27) 791175 189 139/109} 72) 66) 36) 32 
(v) G. morsitans ... | = 7| 3L| 148 | 230 | 326 252 237 | 184) 143 | 115 | 130 | 110 
(vi) Wild Game ... | 1 8} 53/118 | 252 381 | 348 | 285 | 200 | 162 | 149 | 135 
(vii) Human Strain | — | —|—}— | — 1} 10/ 41/154) 325 | 494 | 528 577 | 512 | 525 | 511 | 464 | 425 
| (viii) Chituluka... | |= | - 1 8} 48} 81, 78) 71) 44, 46] 56) 53) 98) 120 
| | | | | 
Length in Microns—(continued). 
| | jie | | | | 
Strain 27 | 28 | 29 | 30 | 31 | 32 | 33 84 | 85 | 36 | 37 | 38 | 39 | Totals} Source 
- | ee ae | aes 
T. pecorum sti - | | |—|—| 2000 | R. S. Proc. 87, B, p. 13 
7. simiae | esa ea 500 | Ibid. 85, B, p. 477 
T. caprae 68| 57; 24) 9) 2) 2)/—|—|} - 500 | Zbid. 86, B, p. 278 
(i) TZ. rhodesiense | 61| 72| 50] 52| 28| 13/13|-5| 1| 1|--|—| 1] 1000 | Jéid. 85, B, p. 227 
(i) 7. brucei 72| 50) 38| 27) 26) 18)11} 4) 4)/—|—j| 2 |—] 1000 | Jbed. 84, B, p. 331 
Gii) ZT. gambiense 47| 44) 31] 20] 11] 4] 4 - -—|—-| 1000 | Ldzd. 84, B, p. 330 
(iv) Mzimba Strain | 24] 22) 16) 7) 4] 4)—| | —|—| 1000 | Zbcd. 87, B, p. 31 
(v) G.morsitans ... |127|133/113} 96; 54; 44/11) 7; 2;—|—|—]/— | 2500 Lbid. 86, B, p. 419 
(vi) Wild Game ... | 125/110] 62} 55} 33] 12] 7/ 3} 1|/—|--|—|—| 2500 | Jbid. 86, B, p. 405 
(vii) Human Strain | 372] 347 307 | 198 | 167 | 123 | 77 | 36} 12/11} 2 | 1 |-—| 6220 | Zoid. 86, B, p. 330 
(vill) Chituluka 111 | 128/138} 99/117} 91/63/27/11| 9) 1 | 1 |—J| 1500 | Zbed. 86, B, p. 291 


* R.S. Proc. Vol. 86, B, p. 422. 
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At any rate the exact method of differentiation adopted would be of interest 
to the statistician. The result of the paper is that the four species of trypanosomes 
occur in quite comparable permilles of tsetse flies caught in the sleeping sickness 
area of Nyasaland, and there is no evidence to show that they or other strains also 
may not occur side by side in the same fly or in the same specimen of wild game. 
Further, these compound strains would then appear in different proportions in the 
host. Some such hypothesis seems very needful to account for the extreme 
heterogeneity of the wild game, wild G. morsitans, and human strains as recorded 
by Sir David Bruce and his colleagues. The following table gives a comparison of 
what appear to be homogeneous strains—T. pecorum, T. simiae and T. caprae— 
with what appear statistically to be heterogeneous strains, ie. 7. brucer, 
T. rhodesiense, T. gambiense, the Mzimba strain, the wild-game and wild G. 
morsitans strains of human type, and the human strains themselves. The table 


Means, Standard Deviations and Coefficients of Variation of eleven Strains 
of Trypanosomes. 


Seri M Standard Coefficient 
Bs ean Deviation of Variation 
T. pecorum 13°992 + :019 1°2816 +014 9°16 + ‘099 
T. simiae 17°870 + 050 1°6558 + ‘035 9°27 +°199 
T. caprae 25508 + :063 2°1011 +045 8°58 +184 
() T. rhodesiense ... 23°577 +°100 4°6764+ 071 19°83 +°311 
Qi) ZT. brucei 23529 + 094 4°3938 + ‘066 18°67 +°291 
Qui) Z. gambiense 22°113 + ‘081 3°7867 + ‘057 17°12 + °266 
(iv) Mzimba Strain... 217413 +:063 2°9586 + 045 13°82 + 212 
(v) G. morsitans 22°695 + 058 4°3002 + 041 18°95 + 187 
(vi) Wild Game se 22622 + 047 3°4541 + 033 15°27 +:'174 
(vii) Human Strain ... 23°796 + °035 4°1262 + 025 17°34+°108 
(viii) Chituluka 26°172 + ‘084 4°8414+.060 18°50 + °235 


above, gives the means, standard deviations and coefficients of variation of these 
strains. It will be seen that the first three are of a very different character to the 
last five. The variation of the latter is about double that of the admittedly pure 
strains, and throughout the whole course of our further work this possibility of 
heterogeneity, and the differential selection of the components by the host must 
be borne carefully in mind. Great divergences do not discourage the use of 
biometric methods, and we get occasionally identities of strains which are quite 
beyond the limits of chance coincidence and which point to definite possibilities if 
only host, environment, and treatment are once effectively standardised. I propose 
to try to throw some light on these points in the remaining sections of this paper. 


(6) On the Probability that Strains are alike after allowance for the Host. 


(a) Luckily in certain cases the treatment has been more or less alike. Thus 
in the wild Glossina morsitans strain, the tsetse flies brought to the Laboratory 
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from the “ fly-country” were in one strain (I) fed on a monkey and in the case of 
four other strains (II to IV) fed on dogs. From these animals thus infected others 
were inoculated, but in each case only the trypanosomes from a single rat were 
used for purposes of measurement and comparison. ‘The following table gives the 
frequency distributions of the five strains, and chiefly on the basis of these 
distributions, Sir David Bruce and his colleagues conclude that: 

“The five wild Glossina morsitans strains resemble each other closely, and all 


belong to the same species of trypanosome.” (p. 421.) 


Wild G. morsitans Strains*. 


Microns. 


Strain I 


|] 


Se Oe 


Investigating the statistical measure of resemblance 


230 | 326 | 252 | 237 
| | 


| 25 


143115 130 


7 


the following series of results : 


Strains I and IT: x? = 81°88, P < 000,000,1, 
Strains I and III: Va aLoosil, P < :000,000,01, 
Strains I and IV: ye OOS: P < :000,000,1, 
Strains I and V: 2— 115°77, P < :000,000,1, 
Strains II and III: x? = 32812, P < :000,000,01, 
Strains II and IV: x? = 184°88, P< :000,000,01, 
Strains II and V: x? = 208:79, P < :000,000,01, 
Strains III and IV: x? = 122°79, P < :000,000,1, 
Strains III and V: x? = 147-20, P < :000,000,1, 
Strains IV and V: x? = 23°90, P =:2470. 


in the usual way we have 


Statistically therefore there is not the faintest resemblance whatever between 
any pair of these strains except the IV and V. These strains are for practical 
purposes interchangeable. In one out of every four trials two pairs of samples of 
500 from the same trypanosome population would give results more divergent than 
those observed. But what is the source of this resemblance? Why are these two 
strains alike and all the others widely divergent? There is nothing whatever in 
the paper to account for this agreement, and it is the more remarkable because 
Strains IV and V are to the statistician the most compound looking of all the 
strains. But some uniformity of origin or treatment has caused the two com- 
ponents to appear in like proportions, and at the back of this resemblance there is 
some vital point, if we could follow it up. Were the two dogs bitten by the same 

* RS, Proc. Vol. 86, B, p. 409 et seq. 


fly, or Rats 658 and 660 really inoculated from the same dog ? 
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there is a 


point here which ought to be cleared up, for otherwise the statistician could only 
conclude that the wild G. morsitans strains are widely divergent, and that their 
compound nature suggests that the tsetse fly carries various types of trypanosomes 


and these in varying proportions, 


(b) I now turn to the five human strains dealt with by Sir David Bruce and 


his colleagues. 
animals. 


Human Strains. 


A: Compounds from Various Animals*. 


Let us first consider the human strains compounded from various 
The following table gives the length distributions : 


Microns. 
| 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 
Strain I, Mkanyanga ... 4] 19 | 42 | 63! 81) 75) 91) 65) 66} 93] 91/107 
” ) 2 2 | 12 | 55 | 108] 159 210 | 188 | 215 | 177) 138) 83 
» II], Chituluka 1 8 | 48 | 81 | 78| 71] 44] 46] 56) 53) 98] 120] 
» LV, Chipochola ... 2] 4 | 32 | 68 |110/101/109|106) 95° 95] 74] 64| 
» V, Chibibi 1 8 | 20 | 58 | 117(122|123,107| 93) 98] 63} 51 
| 
— 
Sum 41 | 154 | 325 | 494 [oes 577 | 512 | 525 | 511 | 464 | 425 
Human Strains. A: Compounds from Various Animals—(continued). 
Microns. 
38 | Totals 
Strain. I, Mkanyanga . 1220 
ee al B 1500 
» Ill, Chituluka ... 1500 
IV, Chipochola ... 1000 
V, Chibibi 1000 


6220 


We may conipare the strains precisely as in the case of the wild G. morsitans 


We find: 


Strains I and II: 
Strains I and III: 
Strains I and IV: 
Strains I and V: 


strains. 


Strains II and III: 


Strains II and IV: 
Strains II and V: 


Strains III and IV: 


Strains III and V: 
Strains lV and V: 


x? = 408°50, 
x? = 204-99, 
x? = 180°63, 
x” = 20540, 
x” = 923°62, 
x foi 0k, 

veh, 00, 

x? = 53132, 
x? = 563°82, 
x? = 16°81, 


* R. S. Proc. Vol. 86, B, pp. 287, 291, 295, and 297. 


B, p. 423. 
Biometrika x 


P =< -000,000,01, 
P =< :000,000,01, 
P = < :000,000,01, 
P =<:000,000,01, 
P = <:000,000,001, 
P =< -000,000,5, 
P =< 000,000,5, 
P = <:000,000,01, 
P = <:000,000,01, 
P =7733. 


For Strain I see R. S. Proc. Vol. 85, 


15 
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Again we have the remarkable result that all the human strains are statis- 
tically divergent beyond any possible comparison, except those of Chipochola and 
Chibibi which show a high degree of correspondence. Now is this result the 
outcome of treatment? We note the following diversity of hosts : 


Strain I. Strain II. Strain III. Strain IV, Strain V. 
Gross | Percentage | Gross | Percentage | Gross Percentage | Gross | Percentage | Gross | Percentage 
Men 60 4°9 — 0-0 - 0:0 — 0 — 0:0 
Monkey ... 100 8:2 160 10°7 160 1KO}27/ 160 16°0 160 16:0 
Goat 20 16 60 4:0 80 5°3 80 ‘0 80 8:0 
Sheep 60 4°9 20 1:3 1 0-0 ‘0 — 0°0 
Dog ss 260 21°3 260 17°3 260 17°3 260 26°0 260 26°0 
Guinea Pig | 120 9°8 _ oOo | — 0-0 — ‘0 — 0-0 > 
Rat 600 49°2 1000 66°7 1000 66°7 500 50°0 500 50°0 
Totals 1220 — 1500 — 1500 — 1000 — 1000 — 


Now it will be clear at once that the percentages of trypanosomes drawn from 
various types of host are identical only in the case of Strains IV and V, which we 
have found in close accordance. But there is not great divergence in source 
between Strains II and III although Strain I shows fairly wide differences. We 
find, however, that II and III are statistically very unlike, the next closest 
resemblances, although very slight, being between II and IV and V. It would 
not seem therefore that the degree of similarity is wholly determined by similarity 
of hosts. I have accordingly reinvestigated the five human strains by taking rats 
only. But, of course, even then it is of vital importance to be certain that the 
process of transfer from man to rat was the same in all five cases, and of this no 
evidence is provided. 


Human Strains. B: From Rat only*. 


Microns. 

15 | 16 | 17 | 28 | 19 | 20 | 22 | 22 | 29 | 24 | 25 | 26 | 27 

| 
Strain I, Mkanyanga =e lie 1 | 21 | 40 | 52 | 49 | 80 | 31 | 36 | 33 | 48] 52 
» LL, 5, Rat 728 .~f—|—| 2] 4] 15 | 30] 57 | 72 | 85 | 72 | 59 | 44 | 26 
» I, E, Rat 796... --... | — |— | 2 24 | 30.) 42 |60))) 61 "87 | 78ul soulmoralnoo 
» III, Chituluka, Rat 952 | 1 | 3 | 21 | 27 | 23] 15] 10] 15 | 19 | 21 | 34 | 44 | 36 
3 ILI, Chituluka, Rat 953 — ih 17 | 26 | 20 | 19 | 15 | 14) 26 | 18 | 33 | 40 | 34 
,» IV, Chipochola, Rat 1337] — | — | 4| 6] 16| 29 | 53] 61 | 59 | 69 | 56 | 51 | 36 
.) V, Chibibi, Rat 1660... | —- | -— | — | 4 {17 | 29 | 46 32 | 69 | 73 | 52 | 40 | 31 
Sum Ne Ete via 1 | 5 47 | 112! 161 | 216 | 290 | 316 | 376 | 362 | 322 | 294 | 235 


* R. S. Proc. Vol. 86, B, pp. 288, 289, 292, 293, 295, and 298. For Strain I see R. S. Proc. 
Vol. 85, B, p. 423. 
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Human Strains. B: From Rat only—(continued). 


Microns. 
: Totals 
Strain I, Mkanyanga 600 
eT. E). Rat 728 500 
55 II, E, Rat 726... dis 500 
» II], Chituluka, Rat 952 500 
» III, Chituluka, Rat 953 500 
» LV, Chipochola, Rat 1337 500 
i V, Chibibi, Rat 1660... 500 
Sum... male ... [219 | 210] 134| 108} 88 | 57 | 28 | y) S| dl 1 } 3600 


This table with its two pairs of rats inoculated from the same strains is 
peculiarly instructive. We can compare II, Rat 726, with I, Rat 728. 


We find: x? = 36195, giving P =-0048. 


This is far from the high degree of divergence we have found between the com- 
pound human strains, but it is not satisfactory as a measure of the agreement of 
the same strain in two hosts of the same species. 


Applying the same test to the two Rats 952 and 9538 of Strain III we have: 
Vv? =14715, giving P="9038. 


This is, of course, quite satisfactory. We should not hesitate to assert identity 
of strains and of treatment in the case of the trypanosomes from these two rats. 
The statistician will feel fairly confident that there is a factor of divergence 
between the trypanosomes of the two rats in Strain IT, which does not occur in 
the two rats of Strain III. He will be almost certain that the strain was not 
conveyed through the same steps or at the same stage of the disease to the rats in 
Strain II. Unfortunately dates and processes are not discussed. Sir David Bruce 
and his colleagues say that it is remarkable how much alike these distributions for 
Rats 726 and 728 are, and again for the distributions for Rats 952 and 953 that 
they also closely resemble each other. “It is curious and striking that the same 
strain of trypanosome growing in two different animals should show this remarkable 
similarity*.” The interesting point is that the statistician would agree with the 
remarkable similarity in the latter case, but the divergence not the remarkable 
resemblance in the first case would force him to seek for some explanation in 
treatment. It will, I think, be clear from these illustrations that a strain of 
trypanosomes, even if obviously compound, can be taken from a single source and 
after inoculation into two different individuals of the same species be identified 
as same; but to insure this result on every repetition the greatest caution will 
have to be exercised as to identity of process and treatment. 


* R.S. Proc. Vol. 86, B, pp. 289 and 293. 
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There are still further results of importance to be ascertained, however, from 
our table of human strains. Let us compare Strains IV and V, which we found 
resembled each other closely even for compounded hosts. We now reach 


x? = 14085 and P=:5229. 


Or, the probability that these two strains are identical has been reduced by 
selecting out the rat data only. But the result is still so high that no one would 
hesitate to assert that Chipochola and Chibibi were suffering from a disease due 
to the same strain of trypanosome. The correspondence is so close that we have 
combined Strains III and V for all other comparisons. In the case of Strain ITI, 
we have added together the results for Rats 952 and 953. Such addition is less 
reasonable for Rats 726 and 728, but without doing this, it is impossible to decide 
which rat is to represent the E strain. I have then made the following com- 
parisons : 
Strains IV and V with III: y*?=525-67, P <:000,000,01. 

There is accordingly no similarity at all between the Chituluka strain and that 

common to Chipochola and Chibibi. 


Strains IV and V with IT: y? = 64°70, P < ‘000,001. 


Thus the strain from the European E from Portuguese East Africa diverges 
from the Nyasaland strain widely, but not as widely as that of Chituluka does 
from those of Chipochola and Chibibi. 


Strain I with Strain HI: y? = 12613, P <:000,000,1, 
Strain I with IV and V:  y? = 21782, P < :000,000,01. 


Thus the trypanosomes from Mkanyanga are widely divergent from those of 
the three other Nyasaland cases. Nor are they any closer to the European E: 


Strain I with Strain Il: yx? = 331°37, P <:000,000,01. 


Thus with the exception of the Chipochola and Chibibi strains, the trypanosome 
distributions from human sources differ widely. Nor is this to be wondered at, if 
the human beings owe their trypanosomes to Glossina morsitans, for in that case 
we should expect the human strains to be as diverse as we have found those from 
the tsetse fly itself. It would remain to explain the close similarity of the 
Chipochola and Chibibi cases. It would be interesting to know the history of 
these cases with regard to locality and to the possibility of a unique source 
of infection. 


(c) In the case last dealt with, namely that of Chipochola and Chibibi, we 
have the remarkable feature that the strains although significantly identical, 
whether treated in the rat alone or in compounded distributions from various hosts, 
resemble each other somewhat less closely in the single host series. This is not 
generally the rule. Some of the big divergencies we have already noticed become 
far less appreciable, nay, even become resemblances when we confine our attention 
to one species of host. The chief misfortune which then too often arises is the 
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paucity of the total numbers that we have at our disposal. I will consider, 
however, from this aspect the relations of the three strains wild G. morsitans, 
wild game, and Mvera cattle. 

I compare first the lengths of 200 trypanosomes from wild G. morsitans and 
wild-game strains. These yield for the host, goat* : 


Microns. 

| Satie 
From Goat 9 | 10 | TA) 12. | 13 | 1h | 15 | 16 | 17 | 18 | 19 Totals | 
é | | Ps aa | 

Wild G. morsitans Strain | 1 | 3 | 12 | 21) 55 | 60 | 32 | 12) 4 | —| — | 200 
Wild-Game Strain Hae | —|}— | Wk | 37 | 73 | 38 | 26) 8 2 200 | 

| 
giving : x? = 26782 and P=-0015. 


To further test this, I take the same two strains in the dog as host+: 


Microns. 


| ere ee | 
From Dog D0 | IED RED HES NGI A GESY IRE TRS | is} | Totals 

i onl | fe 

| eas | ee ee | 
Wild G. morsttans Strain 5) 3 . 34 41 | 40/19) 9 | — 160 
Wild-Game Strain ae al a 31 | 57 | 50 | 24 | 6 | —| 180 

| 
| \ | | 
Here v= (045 and P= 3171. 


The value we had previously found for a mixture of all strains was P = -0002. 
Thus the two strains may be considered as identical when we deal with the 
trypanosomes from the dog, as showing considerable divergence when we take the 
goat, and as showing marked divergence when we take a great variety of hosts. 
The weight of evidence in favour of a standardised treatment thus becomes very 
great. 


Let us look at precisely the same material for the wild-game strain and for 
the Mvera cattle strain, first for the goat and then for the dog as host+. The 
grave difficulty is the paucity of measurements thus differentiated. 


Microns. 
i =a Eee aaa | | : ie yi eel | lo | 
| From Goat 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | Totals | 
| : SS | : | 
Wild-Game Strain ...|— | — | 1 | 16 | 37 | 73 | 38] 26| 8 | 1 | 900 

| Mvera Cattle Strain ... | 1 1 OF la 225260) TOE sa 1} ==) 100 
| ed | 
This gives x’ = 14670, leading to P =-'1013. 

* R.S. Proc. Vol. 87, B, pp. 6 and 11. 

+ R. S. Proc. Vol. 87, B, pp. 6 and 11. 


by by 


. S. Proc. Vol. 87, B, pp. 3 and 5 


+ 
+ 
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Microns. 
| From Dog | 11 | 12 | 13 | 14 | 15 | 16 | 17 | Totals 
eee | ps ee Me | 
| : | | 
| Wild-Game Strain soe fe IQ ASI D7 de bOM24alaG | 180 
Mvera Cattle Strain | || Lia O eel 8 | — 100 
| 
| 


I 


This leads to x? = 15992, P="0138; 


Previously (p. 98) on the total series of different hosts we had found 
P=-000,243, Thus by referring our material to individual hosts, we have reduced 
the degree of divergency between the wild-game and Mvera cattle strains, but it 
would be still hazardous to state that these strains are identical. 


Lastly, we turn to the Mvera cattle strain and the wild G. morsitans strain 
dealing with dog and goat as hosts separately * : 


Microns. 


From Goat ako) |) ib ae 1h 15 | 16 | 17 | Totals 


| | 


Wild G. morsitans Strain ... | 1 Sale | 21 | 55 |.60 32 | 12 | 4 200 
Mvyera Cattle Strain eeu Cal 1 3] 14 22 | 26 | 19°) 2372 100 
| | 


This gives x= 1-968, P = °4368. 


And again : 


Microns. 


| | 

From Dog 9 | 10| 21 | 12 | 13 | 24 | 15 | 26 | 17 Bas 
| 

ee eee ae eer 

5 5 | 
Wild G. morsitans Strain... | — | -— | ela: | 34 | 41 | 40 | 19 | 9 160 
Mvera Cattle Strain | —|—| 3/11] 27) 30} 21-4 8 |= 19 ico 

| | | | | | | | | 

resulting in x? = 11:120, P=:0852. 


The Mvera cattle strain and the Glossina morsttans strain had for all hosts a 
divergence measured by P= 000,008. Thus the great bulk of this divergence 
is due to multiplicity of hosts f. 


To sum up the results obtained for 7. pecorum in Mvera cattle, wild G. morsitans 
and wild-game strains, the identification of these strains was quite illegitimate on 
the basis of the compound host frequencies. It is reasonable on the basis of 

* R. S. Proc. Vol. 87, B, pp. 3, 10—11. 


+ It is worthy of note that in comparisons with the cattle strain the goat appears to give closer 
results than the dog, but the dog appears the better in the comparison of the G. morsitans and wild- 


game strains, 
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trypanosomes taken from a single species of host. But how far the resemblance 
in these cases is produced by a selective influence of the host and not necessarily 
by an identity of all the members of the strain before transference to the host is 
not demonstrated. 


On the other hand while divergence due to host will account for the divergences 
which are so notable in 7. pecorum, it will not account for the divergences in the 
human strains; these are startlingly conspicuous even if we confine our attention 
to a single species of host. Precisely the same remarks apply to the trypanosomes 
similar to those causing disease in human beings found in wild game and in the 
tsetse fly itself. There must be another source for these divergences. 


(7) Discussion of the Heterogeneity which is statistically demonstrable in the 
bulk of the Trypanosome Measurements. 


The reader who has attentively followed the course of the argument in the 
previous sections will be prepared for the next step in this memoir, the attempt to 
account for the large divergences between strains of trypanosomes in individuals 
of the same species by the heterogeneity of those strains. My suggestion is that 
the strain in one fly differs from that in another because the components do not 
appear in the same proportion, the strain in one specimen of wild game from that 
in another, or in one man from that in another because they have been bitten by 
a fly containing the components in unlike proportions. The host does make some 
difference, either by nutrition or selection of trypanosomes, but it is a minor differ- 
ence. Thus consider what we may probably hold to be pure strains and observe 
the average differences in length found by Sir David Bruce and his colleagues: 


Microns. 
T. pecorum 
T. simiae* T. caprac | 
Mvera Cattle + Wild G. morsitans § 

Goat 17°3 | Waterbuck 26°8 | Donkey 13°5 | Goat 13°5 
Monkey Loa Ox 25°7 | Ox 14:2 | Monkey 13°6 

— Goat 25°3 | Goat 13°8 | Dog 14:2 

— Sheep 25°6 | Dog 13°8 | Guinea Pig 14°6 

— | — Rat 14°8 | Rat 14:0 
Max. Difference 0°8 Max. Difference 1°5 | Max. Difference 1°3 | Max. Difference 1:1 | 


We may thus anticipate that in a pure strain the change of host would hardly 
make a difference of more than 2 microns in the average length. We must 


* R. 8S. Proc. Vol. 85, B, p. 479. : t+ R. S. Proc. Vol. 87, B, p. 3. 
+ R. S. Proc. Vol. 86, B, p. 279. § R. 8. Proc. Vol. 87, B, p. 10. 
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accordingly be prepared for some such change as this in the shifting of the mean 
when the host is varied. 

We have next to inquire what type of curve accurately describes the strains 
which we are fairly certain are homogeneous. 


If the reader will turn back to p. 110 he will note at once a marked difference 
between the distributions for 7. caprae, 7. pecorum and T. simiae when compared 
with those entitled Mzimba strain, human strain, wild-game strain, 7. brucei, 
T. rhodesiense, T. gambiense and the wild G. morsitans strain. The coefficients of 
variation of the former group are all under 9°5 (mean = 9:00), the coefficients of 
variation of the latter group are all over 13°5 (mean = 17:29). We recognise 
therefore a totally different order of variability. Even in absolute variation as 
measured by the standard deviations we find the first group with its mean 
S. D.= 1°68 and the second with its mean 3:96. An examination of the graphs 
scattered through the trypanosome papers to which we have referred will, we think, 
convince the statistician that we have to deal with heterogeneous and not skew 
homogeneous material*. It becomes of course important to ascertain whether in 
the pure strains a Gaussian curve will suffice to describe the frequency closely 
enough for statistical purposes, for, if it does, the analysis into at any rate two 
Gaussian components of the heterogeneous strains becomes relatively direct, if 
laborious. I will consider the 7. pecorum, T. simiae, and T. caprae strains 
from this standpoint. 


(a) T. pecorum (see p. 110). 
Mean = 13:992 microns. S.D.=1:2816 microns. 


boa Observed | Calculated 
Microns Values | Values 
9 and under 2 0°46 
10 6 | 5°98 - 
11 42 45°41 x°=7°630 
12 193 192°52 P= 572 
138 452 456°70 
Ls 618 607°12 
15 453 | 452°49 
16 178 188-98 
LT 51 44°16 
18 and over 5 | 6°20 


Hence in 57 out of 100 trials from material following the Gaussian distribution 
a more divergent sample than that observed would actually be obtained. We can 
therefore conclude that a simple Gaussian frequency adequately describes the 
distribution in size of 7. pecorwm. This is illustrated in Diagram II. 

* Note especially the bimodal graphs in R. 8. Proc. Vol. 83, B, pp. 5 and 11, for both the Uganda 


and Zululand strains of 7. brucei, in Vol. 86, B, pp. 291—293, for human strains, in Vol. 86, B, 
pp. 395, 397 for wild-game strains and pp. 409, 411, 417 and 419 for G. morsitans strains, 
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Driacram II. Gaussian fitted to T. pecorum Frequency. 
(b) T. simiae (see p. 110). 
Mean = 17'870 microns. S.D.=1°6558 microns. 


Microns | Observed Calculated 
Values Values 
14 and under ui 10°46 
| 15 28 27°63 
16 | 76 63°92 x?=8'149 
17 93 103°78 P= +520 
18 126 118°32 
19 92 94°66 
20 47 D878 
21 22, 20°96 
22 6 5°80 
23 and over 3 1:29 
150 
140 ; 
3 130 
2 120 
S$ 110 
Ss 
S 100+ 


Frequencies per Micron. 
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Dracram III. Gaussian fitted to T. simiae Frequency. 
Biometrika x 16 
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We conclude that the Gaussian adequately describes the distribution of 
T. simiae. In more than half the trials we should get a worse sample. See for 
graphical fit, Diagram ITI. 

(c) T. caprae (see p. 110). 


Mean = 25°508 microns. S.D.=2°1011 microns. 


Microns Observed Caleulated 
Values Values 
| 20 and under Aas 4:98) | 
21 8 9°82 
22 23 23°95 
23 49 46°74 x?=5°175 
24 79 73°05 ae 
25 95 91°38 val 
26 80 91°54 | 
27 | 68 73°45 
28 57 47°16 
29 24 | 24°26 
30 9 9°98 
| 31 and over 4 | 4°38 


This is a still more excellent fit; if the Gaussian represented the population, 
in 92 °/, of samples we should get a more divergent sample than that observed. 
The curve is given in Diagram IV. 
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Diacram IV. Gaussian fitted to 7. caprae Frequency. 


It will be clear from the above three illustrations of what we may term 
homogeneous trypanosome strains that the Gaussian curve of frequency suffices 
to describe adequately such material. It is equally clear that no Gaussian can 
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possibly describe such skew distributions as we get in the wild-game strain or wild 
tsetse fly strain of the trypanosome species identified by Sir David Bruce and 
colleagues as 7. rhodestense*. It is equally impossible in the case of the human 
strains figured in the paper of February 1913+. I illustrate this on the frequency 
distribution for 6220 trypanosomes of human strains}. 


Observed Calculated Observed Calculated 
14 and under 1 | 75°45 26 425 | 520°55 
5 10 =| 62°51 I a7 | 372 444-42 
16 | 41 101'51 —s| 28 347 || 35796 
17 154 | 155°50 | 29 | 307 271°88 | 
18 325 | 224°73 I 30 | 198 194-81 | 
19 494 306-27 | 31 | 167 | 131°68 
20 528 393-73 | 32 123 83°91 
21 577 | 47739 83 77 50°44 
22 512 545°93 | 34 36 28°61 
33 525 588-91 | 35 | 12 15°30 
24 511 599°17 | 36 and over | 14 | 14:18 
25 464 | 575°04 | | 
| 


Here y?= 501 and P< ‘000,000,001. In other words description by a Gaussian 
is absolutely impossible. The histogram of observations and the curve are shewn 
on Diagram V. 

Now the suggestion that flowed at once from these results was the compound 
nature of all the material classed under the headings : 


G) TZ. rhodesvense. 

que) L brucer. 

Gu) TZ. gambiense. 

(iv) Mzimba Strain. 

(v) Wild G. morsitans Strain. 
(vi) Wild-Game Strain. 

(vii) Human Strain. 


With the experience of the Gaussian fitting the homogeneous strains, the direct 
step was to investigate whether the above material could be analysed into two 
Gaussian components and to determine how nearly these components were in 
agreement. The method of carrying out this analysis was provided in the first 
of my series of Contributions to the Mathematical Theory of Evolution§, There 
was nothing to prevent the process being applied to every individual frequency 
given by the trypanosome workers, except the very laborious arithmetic. The 
method was applied to the above seven cases, and also (viii) for the purposes of 
illustration to a single human case, that of Chituluka, a native of Nyasaland, who 

* See R. 8. Proc. Vol. 86, B, pp. 407 and 419. 
+ See R. S. Proc. Vol. 86, B, pp. 285 et seq. 
See R. S. Proc. Vol. 86, B, p. 300. 


- 
a 
§ Phil. Trans, Vol. 185, A, pp. 71—110, 1894. 

16—2 


A Study of Trypanosome Strains 


124 


‘soumosouvddiy, weuny_ Jo uowynqraysiq Aouonbely 4g 04 UvIssney jo oInTIVq “A NVUOVIG 
“sUuOoLoryy 


ge LE 9E GE VE SE GE LE OF 66 BG LG YS GB VS ES GS 1G OG BL gL LL SieSESvE Se Gl Ie OL 


i ine ane 
BN irae 
At ey 
N 
s 
\ 


fe) 


004 


OO 


OOV 


00S 


009 


OOZ2 


008 


‘uolay dad sarauanbaty 


0669 1PI0L 


KARL PEARSON 125 


died of sleeping sickness*. With the single exception of 7. brucei every one of 
these distributions broke up into two components, and into two components with 
strikingly close means. I propose to call these two components 7. minus and 
T. majus. I do not assert that they are distinct species; they may be dimorphic 
groups of one and the same trypanosome species. But the recognition of their 
existence seems to bring some order at least into the chaos we have already noted 
as existing in the trypanosome measurements. ‘Two human strains or two wild- 
game strains differ from each other with such wide divergence in their frequencies 
because these two groups 7. minus and T. majus are mixed in the individual 
in different proportions. 


| Standard Coefficients of Size of 
MSs Deviations | Variation Populations 
Strain = =o |e 
T. minus | T. majus | T. minus! T. majus | T. minus | T. majus | TL. minus T. majus 

T. rhodesiense ...| 18°7418| 26-1122 | 2°3184 | 3-4397 | 12-370 | 13-173 eae Bees 

7. brucei... «|| 19°8244 261122 2-6439 | 34134 | 13°337 | 13-072 Peace eos | 
“T. gambiense _...|19°8926) 26-2463 | 2-0566 | 26260 | 10-339 | 10-005 |) ORY. | | BNE 

Feamailecus apr SOD Re Pad iee N GokE ( 634-96 | | 365-04 

| Mzimba Strain ...| 19°8966 | 24:0508 | 1°3961 | 3:1028 7°017 | 12-901 ) 635°, |) 365 om 
| @. morsitans Strain | 19°6475 | 27-1966 | 1°7503 | 2-70138 | 8-908 | 9-932 eae 7 aa ’ 

Wild-Game Strain | 20-4418 | 25-8263 | 16332 | 2-8799 | 7-990 | 11-151 ae ae es 

OO"D eon ipa dO bet 
i 9e . F . 
Human Strain ...| 20-3687 | 26-2930 | 1-9444 | 3-4470 | 9°536 | 13-110. ercee “1 | woe 7 
Chituluka ... ...| 19°8410 | 28-7875 | 1-9785 | 2:8823 | 9-972 | 10-012 ae hoe y 
Means... _... | 19°8315 | 25-9542 | 1-8498 | 3-0328 | 9-360 | 11-712 | 9 —- ee 
T. simiae ... —... | 17°870 Be wp OOS ae aS 2700 = 100° - 
lo 
T.caprae... ..{ — |25:508 | — |21011| — | 8-580 = rae 
| | | 100 °/, 
| i 2 | 


The table below gives the chief biometric characters of 7’. minus and 7’. majus 
as found from the seven resolutions. The mean values of the constants for 
T. minus and for T. majus are placed at the foot; in calculating these mean values, 
Chituluka’s data have been excluded as already included in the human strain, and 
also those for 7. brucei not directly resolved. 

At the foot of the table I have placed the constants for 7’. simiae and T. caprae, 
the nearest pure strains to 7. minus and T. majus respectively. I do not in the 


hase Proc: Vol.86, By ip. 290. 
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least suggest there is any identity, but comparison may bring home to the 
trypanosome worker the average sizes of the two components*. The differences 
of the variabilities are, however, much larger, and the influence of host on 
variability as well as on mean ought to be studied. 

It will be seen at once that the divergence in the individual means of 7. minus 
from the general mean is very slight, at most a micron, and well within the limits 
which arise, as we have seen, from difference of host. It is a most remarkable fact 
that from six independent reductions the mean size of JZ’. minus should come 
out so nearly 19°8 microns. In 7. majus the correspondence is not so good; the 
average of about 26 microns falls to 24 in the Mzimba strain and rises to 28°8 in 
the case of Chitulukat. Still it does not appear to me that these changes of 
mean of the 7. majus component are absolutely beyond the variation due to differ- 
ences of host and treatment. Another more serious matter is the comparatively 
wide range found for the variabilities ; but even here it is impossible to assert that 
such differences will not occur with difference of host. For example the Mvera — 


cattle strain, a fair sample of the simple 7’. pecorwm, gives: 


| Fost M | Standard Coefficients of 
Ot tas | Deviation Variation 
Goat 13°80 1°462 10°592 
Rat 14°75 *839 5689 
Dog 13°79 1:087 7°885 


Here while the means are within one micron, the differences in variability are 
of the same order as those found in 7. majus from different hosts. 


Again, taking a pure homogeneous strain as 7. caprae with goat and sheep as 
host, which are scarcely so differentiated as man and antelope, we find: 


Hoc Mean Standard Coefficients of 
Deviation Variation 
Goat 25°31 2-187 8°642 
Sheep 25°60 1°92¢ 7512 
| 


Lastly, taking 7. simiae for goat and monkey we have: 


r Standard Coefficients of 
| Eos lean Deviation Variation 
| 
(ea ae eee — - aa | 
| Monkey ... | 17°26 1-403 8127 | 
| Goat | 18°11 1°687 9°315 | 


* The maximum average length of 7’. caprae is 26°8 in the waterbuck and of 7’. simiae 18:1. 
+ It should be noted that with the whole of the human data the mean is 26°33 and that Chituluka’s 


mean is very exceptional. 
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I think we may conclude that, allowing for the errors of random sampling 
and the errors arising from the resolving process, the deviations observed in the 
variability of our two components do not invalidate the hypotheses : 


(i) That the widely divergent results obtained from different strains are due 
to the existence in the same individual of two types of trypanosome with very 
varying percentages from individual to individual. 


(ii) That one of these types has a mean length of about 19°8 microns and a 
variability of about 1°8 microns, the other a mean of about 26°0 microns and 
a variability of about 3:0 microns. The means may vary 1 or 2 microns with the 
nature of the host and the variability 0°5 to 1 micron. 


The large type predominates in the Nyasaland human strains*, on the average 
in about the ratio of 3 to 2, but the smaller type predominates in the G. morsitans 
and wild-game strains in about the same ratio; while in the trypanosomes classed 
as T. rhodesiense, and 7. gambiense as well as in the strain from the Mzimba 
donkey the preponderance is still cf the smaller type and the ratio approaches 
13 to 7. Whether these ratios are peculiar to the host or due to the infecting fly, 
it is not at present possible to determine. But the hypothesis of the existence of 
these two types,—whether as a dimorphism of 7. rhodesiense or as independent 
species seems to bring some order into the apparent chaos of recent trypanosome 
measurements. 


The following paragraphs give the calculated constants of the reductions, and 
the numbers of the diagrams showing the nature of the compound frequencies: 
Ga) T. rhodesiense. 
Mean = 23°577, 
fly = 21°86874, fs = 1079°10255, 
Ms = + 401986, Ms = + 1105°74834. 
Reducing nonic: 
249° — 298-7232q' — 5817q° + 1114°7684q° + 34°7620¢' 
— 117924954? + 12°9808q? + 0891g + 0001 = 0 
wheret p,=—10q. 


The root is p,=—12:2578. This leads to the two components in the Table 
p. 125. The histogram of the observations and the two component Gaussian 
curves with their compound are given in Diagram VI. 


The resolution is not a very good one; for 24 groups y?= 37°48, and P =:05, or 
once in 20 trials only we should get a worse result. But an examination of either 
the graph or the original frequency shows at once the cause of this divergence. 
In their measurements Drs Stephens and Fantham have had a strange bias in favour 


* The European from Portuguese East Africa had predominance of T. minus. See R. S. Proc. 
Vol. 86, B, p. 288. 


+ Notation of the memoir Phil. Trans. Vol. 185, A, p- 84, Eqn. (29). 
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| of even numbers. No curve whatever could fit the data satisfactorily under the 
7 circumstances! Either they used a scale graduated to 2 microns only, and had a 
7 prejudice in favour of the scale markings, or else their even numbers were in some 
way more conspicuous than their odd. Whatever the source of this peculiarity 

i” may be, there can be no doubt of the bias*. 

~| The only way to obtain a reasonable measure of the goodness of fit in Stephens 


| and Fantham’s results for 7. rhodesiense is to group from 10 to 12, 12 to 14 and so 

on in comparing the observed and calculated frequencies. If this be done we find 
x? = 5:03 for 13 groups and P=-957, a splendid fit. The frequencies are as 
follows : 


26-28 | 28-30 | 30-32 


10-14 | 14-16 | 16-18 | 18-20 


Observed | 9 | 38-5 | 93-0 | 133-5 
Calculated | 7-17 | 34°67 | 92-99 | 132-91 


20-22 | 22-24 | 24-26 }2-34,| 34-36 | 36-38 | Totals 


| 
134°0 | 127°0 | 1385°5 “1395 | 112°0 | 60° 
6 


5 15 1000 
124°79 | 124°28 | 146°35 | 145°56 | 106°55 | 56°22 | 21°36 | 5° 11 


"17 | 999°85 


{ 


(i) TZ. brucez. The data for this trypanosome were taken from Sir David 
Bruce and colleagues’ diagram+. I have not come across the original publication 
with the measurements involved in this diagram. Describing this species in 
: July 1910, the authors speak of its well-marked dimorphism. This is very 
obvious in the graphs for length given for the Uganda 1909 and Zululand 1894 
strains, but the numbers given are far too slender (160 and 200 respectively) to 
justify any attempt at analytical resolution. Graphically we may take it that 
roughly the following are the means of the components: 


T. minus. T. majus. 
Uganda 1909 20 microns 28 microns 
Zululand 1894 18 microns 29 microns. 


These are not very widely divergent from the values 


19°8 microns 26°0 microns 


we have found from the seven resolutions. 


In May 1911§ the two curves for Uganda and Zululand appear to be added 
together to give a 7. brucei curve of length distribution. This is again markedly 
bimodal with one component mean at 18°75 microns and the other at 27°5 microns, 
both approximative. Thus far 7. brucei appears quite well to fit in with our other 
material. But in September 1911 appears the diagram of 7. brucei said to be 


* Bias of this or of a similar character is not uncommon—even in the pages of this Journal. 
I remember once pointing out to a Scotch anthropometer his prejudice in favour of whole centi- 
metres. He looked at his results, recognised the bias, and then gravely told me that it was not 
due to any personal bias, but that the Creator must have designed Scotsmen on the metric scale! 

+ R. S. Proc. Vol. 84, B, p. 331. 

+ R. S. Proc. Vol. 83, B, p. 2. 

§ R. S. Proc. Vol. 84, B, p. 186. 
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based on 1000 individuals, Here there is a mode about 240, with possibly a sub- 
mode at 19 microns, but the evidence for dimorphism has largely disappeared. 
It is very desirable that we should know the details of this curve, ie. the nature 
of the hosts and so forth, for it apparently replaces the earlier data and remains 
the standard 7. brucei distribution. It certainly shows nothing of the definite 
heterogeneity (or dimorphism) of the previous Uganda material. 

Its constants are as follows : 

Mean 28°5290, 
fy = 19°30583, fy = 996°87764, 
Le; = 10°54837, Hs = 2146°37930. 
249° — 10186189’ — 4°0057q° + 140°6937q° + 62:0835¢q' 
— 29°39409? + 11:2371¢@ + 1.44139 + ‘0331 = 0. 

No suitable root of this equation exists and accordingly it would appear that 
this distribution is not rigidly reducible to Gaussian components. This result is 
so remarkable in view of the obviously bi-modal character of the earlier 7. brucei 
distribution, and the resolution into two components of all the other seven 
distributions, said to be allied to 7. brucez, that I determined to consider the 
matter further by fitting Gaussians to the ‘tails’ of the 7. brucei distribution*. 
I chose as the right-hand ‘tail’ the frequency from 28 to 38 inclusive, and as the 


left-hand ‘tail’ the frequency from 13 to 18 microns-inclusive. The two resulting 
components were : 


T. minus. T. majus. 

m, = 20°0817 (19°83), My = 26-4359 (25-95), 
o, = 2°8685 (1°85), oy = 36399 (3:03), 
he O2o0, Ny = 467°52. 


The totals populations for each component are clearly not very good and their 
combination exceeds by 9°6 °/, the total observed population; but the means are 
not widely divergent from the average values resulting from our six resolutions, as 
the numbers given in brackets testify. Accordingly I determined to select the 
means of the components at values near the mean values of six reductions, and 
after one or two slight betterments, determine the sizes of the populations and 
their standard deviations so as to give the mean, and second and third moments of 
the observed population. These provided: 


T. minus. T. majus. 
iy = 198244, ms — 261122; 
ao, = 2°6439, o, = 34154, 
n, = 410°83, Nei Oooala. 


* Biometrika, Vol. 11. p. 1 and Vol. vi. p. 65. 
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The following table gives the observed and calculated values : 


Microns Observed | Calculated | Microns Observed | Calculated 
B) | 5 | 3°44 26 82 72°74 
Ls 8 | 5°80 | Bie 2) 67°98 
15 14 12°25 28 50 59°71 
16 17 22°79 29 38 48°10 
17 40 37°05 30 Dif 36°04 
18 63 52°87 ruil 26 24°79 
19 55 66°68 82 18 15°67 
20 66 75°44 | Go iil 9°09 
21 63 78°43 BY A 4°84 
22 75 77°49 85 4 PET 
23 87 75°61 36 — | 
2) 93 74:71 BH = -1:75 
25 80 74:36 ©6|| 38 2 J 


From these results we find y?= 29°92 and P=:22. Thus more often than 
once in five trials we should get a worse divergence than the observed, if the 
sample were taken from the calculated population. Some endeavour was made to 
better the fit by small variations from the above solution, discussed by least 
squares, but no improvement was effected. The two components are represented 
in Diagram VII (p. 132). 

(i) =T. gambiense. 

Mean = 22°1130, 
fy = 14°3389, jis = 531°3585, 
fs = 29°1104, Hs = 2429-0948. 
Reducing” nonic : 
24¢° — 7178109’ — 30°5070q' — 300:02609q? + 869°6372q! 
— 278°8475¢q° — 270°9547¢? + 58:9108¢q + 146050 = 0. 

This leads to p,=—10qg=—91777, and the components given in the Table 
p. 125. The two Gaussians and their compound are given in Diagram VIII - 
(p. 133). We find y?= 11:96, giving for n’ = 18, P=°80 a splendid fit. 

Gv) Mzimba Strain (from Donkey). 

Mean = 21°4130, 
fg = 87531, pis = 2935629; 
és = 26°6602, bs = 1926°7045. 
The reducing nonic : 
24¢° + 53°5186q" — 25°5876q° — 4157069? — 171:2637¢4! 
+ 227°12119° — 37:3371¢ — 30°8995q + 86177 = 0. 

The required root is p, = — 10g = — 40000, which leads to the two components 

given in the Table on p. 125. The two components and their compound curve are 
17—2 
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figured on Diagram IX on this page. We have y?= 19:28, giving for n’=17, 
P=-26 a fairly reasonable fit. 
(v) Wild G. morsitans Strain. 
Mean = 22°6952, 
fy = 184918, fy = 7584420, 
fe; = 43°0246, Ps = 89548788. 
210 
200 
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180 
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Diacram IX. Resolution of the Frequency of the Mzimba Strain into 7’. minus and T. majus. 
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Reducing nonic: 
24¢° — 224°6115q’ — 66°6402q° — 595-9589 + 5079°3305q! 
— 4500°7030q? — 1460°5459¢? + 879'6116g¢ + 1522340 = 0. 

The required root is p,=— 10g =—4°75085, which leads to the components 
given in Table on p. 125. These components with their compound curve are 
drawn in Diagram X (p. 136). Here y?=92'75 which for 20 groups gives 
P< :000,000,1. Thus although the G. morsitans strain breaks up into two com- 
ponents the combined curve is not a probable description of the frequency. One 
would like to test another sample of this strain, at present it tells against the 
validity of our reduction. 

(vi) Wild-Game Strain. 

Mean = 22°6220, 
fe ol; fog = 404°4932, 
ty = 29°0514, by = 2247-6657. 
Reducing nonic: 
24q° — 18°94469' — 30°38349q° — 250°2869q° + 851°7475¢' 
+ 118°6154q? — 212°3972¢? + 15°4222¢ + 144283 = 0. 

The root required is p, = — 10g = — 6°9859. There result the two components 
provided in the Table p. 125. The two components and their compound are 
figured on Diagram XI. (p. 137). We find y?=12°61 giving for n’=19, P=°81, 
an excellent fit. 

(vu) Human Strain. 

Mean = 23°7963, 
by = 170252, jt, = 7131660, 
fs = 27-1889, Ms = 80341222. 
Reducing nonic: 
24¢° — 1381°3796q' — 26°5147¢q' — 89°8059q@’ + 96441764! 
— 67427559q3 — 114°7894q" + 8144929 + 95887 = 0. 

The root is given by p,=—10qg=—8'5576, which leads to the components 
given in the Table on p. 125. The two curves and their compound are figured 
in Diagram XII (p. 138). Although the two components merely from the 
graphical point of view do not give a bad fit, the number of trypanosomes in- 


volved is so large that the deviations are not reconcileable with random sampling 
trom two such components. We find y? = 79°67, giving P < ‘000,001. 
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In order to determine how far heterogeneity of treatment or material might be 
responsible we took further frequencies. In the first place we dealt with the 3600 
measurements for trypanosomes through the rat only. The frequencies are: 


| | _ ; 
| | | | 
15 |16|17)| 18 | 19 | 20 | 21 | DODD 2 | 25 26 27 | Q8 "29 | 80 1 8L 132.83 34 | 35 | 86 | 387 | 38 Totals! 
| Py 

| | ms : a i) =| aa 
iL 5 | 47 |} 112 | 161 | 216 | 290 | 316 | 376 | 362 | 322 | 294 | 235 | 219 zu al 108 | 88} 57 | 28] 9 | 8) 1 1 | 3600 | 
| | ee Ie | | 
400 


350 


300 


250 


200 


150 


100 


50 


1415 16 1718 19 20 21 29 93 24 25 96 27 28 29 30 31 32 33 34 35 


Microns. 


Dracram XI. Resolution of the Frequency of the Wild-Game Strain into T. minus and T. majus. 
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These give : 
Mean: 24°6175, 


ple = 15°25897, Hy = 602°23008, 
fs = 19°21542, fs = 2023°21556, 
leading to the reducing nonic : 
249° — 80°8739q' — 13:2924q" — 42°3159q? + 306°5227¢' 
— 166°4257q? — 248654q? + 12°6008q + 1:2081 = 0 


which gives po = — 10q = — 70031. 
This provides the two components : 
T. minus. T. majus. 
m, = 21:6772, my, = 26°9993, 
o, = 2°2404, T, =a 298), 
nm, = 1611:18, Ny, = 198882. 


The components and their compound are figured in Diagram XIII, p. 140, 
and we find for n= 21, y?= 52°68 and P=-00016. There has thus been much 
improvement of goodness of fit, although the result is still unsatisfactory. 


It is impossible, however, to look through the graphs given by Sir David Bruce 
and others for the human strains* without being convinced of their fundamentally 
bimodal character, although there appears to be much evidence of its being 
disguised by heterogeneity of host and treatment. 


(viii) Diagram XIV (p. 141) gives the resolution for the human strain from 
Chitulukat. The constants 


Mean = 26:172, 
fin = 2302260, Hy = 1179°30786, 
fy = — 3718226, bs = — 3248°43805, 
leading to the reducing nonic: 
24q° — 393°8678q' — 49°6370¢q° + 520°2910q° + 8226:94354' 
— 12493°5620q' — 101-101 7¢@? + 855°7520q + 63°2383 = 0. 

The value of the root is p,=— 10g =—16:2295 and this leads to the com- 
ponents given in the Table p. 125, and illustrated in the diagram. The graph 
while giving broadly some of the features of the case is by no means a satisfactory 
fit; for n= 21 groups, y?= 86 and P is < 000,000,1. The diagram suggests that 
we are probably dealing with a mixture of three components with means about 


18°5, 25°5 and 31-0, but at present we have no satisfactory method of performing 
multiple resolutions of this character. 
* R. S. Proc. Vol. 86, B, pp. 285—302. 


+ R. S. Proc. Vol. 86, B, p. 291. 
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It will be seen that the following strains, 7. rhodesiense, T. brucei, T. gambiense, 
the Mzimba, and wild game, give either reasonable or excellent results as combined 
frequencies of 7. minus and T. majus. On the other hand the G. morsitans and 
the human strains break up into reasonable pairs of componeuts, but the goodness 
of fit test is not fulfilled. In the case of the human strain, we better matters 
somewhat by taking the strain through the rat only, but the fit is still bad. If we 
confine our attention to a single human being, the case of Chituluka, we still do 
not get a satisfactory fit, although few statisticians could look at the four diagrams 
published by Sir David Bruce and others for Chituluka*, and not recognise the 
character of the material as being at least bimodal. The same applies to the 
Mkanyanga data of an earlier paper}, it is distinctly bimodal. But besides this 
bimodal character there are certain other features in the human data, and to a 
lesser extent in the G. morsitans, which appear to some extent to disguise the 
bimodal features. I am not prepared to assert definitely that this is the appearance 
of a third component. It is of course easy to improve the fit of the distribution 
by the introduction of such a third component, but the remarkable excellence 
of a bimodal resolution for 7. rhodesiense, T. gambiense, and the wild-game strain 
makes me hesitate at present to adopt such an expedient. 


Owing to the courtesy of Sir David Bruce (who heard from Sir John Rose 
Bradford that I was much puzzled over the differentiation of strains) I have been able 
to examine a series of drawings of the various strains of trypanosomes. There is 
no other morphological differentiation which impresses itself a priori on the layman 
and statistician, and which might serve as anew measure of the possibility of differen- 
tiation into 7. minus and T.majus. But it occurs to me that an index of breadth to 
length of the nucleus might just possibly serve as a differential character of even 
more importance than the length. It is only a suggestion and considerable caution 
would have to be used in selecting only nuclei not near the dividing stage. But 
it would be of striking interest to see how far the resulting frequency distributions 
for the nuclear indices were or were not bimodal. I think a classification according 
to nuclear index might possibly—to judge from the drawings—cut across the 
forms “intermediate ” in length. But this is only a suggestion which may appear 
idle to the student of the subject?. Some difficulty might also arise from the 
doubt as to whether the index was really greater than 100, or the nucleus as 
a whole had set itself athwart the “length” of the trypanosome. This difficulty 
would certainly have to be considered in the “stumpy” 7. brucei and T. gambiense 


* R. S. Proc. Vol. 86, B, pp. 291 to 293. 

+ BR. S. Proc. Vol. 85, B, p. 428. 

+ Several students of the subject with whom I discussed the matter stated that they considered the 
nucleus so mobile and so impermanent in form, that a ‘‘nuclear index’ would prove of little value. 
I think much objection could a priori be raised to the use of the trypanosome ‘‘length” on the same 
grounds. ‘The problem is rather, whether in dealing with large numbers we do reach an average type. 
It would only be possible a posteriori to justify the use of a nuclear index, i.e. if it were found to differ 
sensibly from one pure strain to a second, and if it confirmed in such cases as 7’. rhodesiense resolutions 
based on length frequencies. 
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forms, but I am inclined to think that the index really passes through the value 
100. Undoubtedly this range of index, or possible athwartness of the nucleus is 
not conspicuous in the simple strains like 7. pecorum, T. simiae and T. caprae. 


Conclusions. (i) If appeal be made to statistical measurements, judgment 
between identity and diversity of strain must be formed by means of accepted 
statistical processes and not by mere comparison of graphs. 


(11) Statistical processes show that the conclusions already formed as to the 
identity of trypanosome strains from mere inspection of the graphs cannot be 
confirmed. 


(111) There must be some standardised process of treatment both in regard to 
host, and to method of and stage of infectivity at extraction. 


(iv) Even making allowance for differences due to host and treatment, we 
find remarkable divergences in the very strains asserted to be identical. 


(v) It would appear that some order would be brought into the chaos, if we 
could consider the strains described as 7. brucei, T. rhodesiense, T. gambiense, the 
wild-game, the Mzimba, and very probably the tsetse fly and the human strains 
as really consisting of two components, which for the time I have termed 
T. minus and T.majus. It is highly desirable that additional measurements should 
be made (? a nuclear index ascertained) to determine whether these lead also 
to similar components. 


I do not assume that this is a final solution of the problem, nor do I assert that 
T. minus and T. majus represent necessarily, although probably, distinct strains ; 
they may be dimorphic forms of one and the same strain occurring in different pro- 
portions. But, I believe, that the suggestion of their existence may help to explain 
some anomalies of the present chaos. I ought also to state quite frankly that this 
paper is not written in a merely critical spirit. I believe that the trypanosome 
workers have undertaken in their elaborate systems of measurements most laborious 
and most valuable work, but, I think, the time has now come when without 
trained statistical aid, but little further progress will be made in a very important 
and urgent matter. 


The very large amount of arithmetical work in this paper would never have 
‘got carried through had I not had the ever ready assistance of my colleague 
Miss Julia Bell; to Mr H. E. Soper also I owe help in the arithmetical work, but 
I have to thank him in particular for the careful preparation of the diagrams, and 
the planimetric determination of their frequencies by aid of which the x? for all 
but two of the compound curves was found. In the case of YZ. brucei and 
T. rhodesiense actual calculation of the areas of the normal curves was used. 


ON HOMOTYPOSIS AND ALLIED CHARACTERS 
IN EGGS OF THE COMMON TERN 


By WILLIAM ROWAN, K. M. PARKER, B.Sc., anp JULIA BELL, M.A. 


(1) Origin of the material and method of measurement. 


The settlement of Common Terns, which provided material for the present 
work, is one of old establishment on Blakeney Point, Norfolk. This is a shingle 
spit of some 8 miles in length on the north coast of Norfolk, about 12 miles 
west of Cromer. The colony is situated on the very end of the point, with 
water on three sides. Here the spit is a combination of dunes, salt marsh and 
shingle, and for the most part the nests are found on the open shingle on the 
seaward side of the dunes. Nests are plentiful in the embryo dunes in some 
years, though this year (1913) none were found there. The colony was more 
scattered than usual and covered the greater part of a mile of sea front. To 
avoid missing any clutches, Miss K. M. Parker, B.Sc., and Mr William Rowan 
divided the nesting area into suitable well marked plots and worked these one 
after another. Each of these again were worked in strips, till a patch was com- 
pleted, when the workers moved on to a remote one, to give the birds a chance of 
settling down again. After measurement each egg was numbered with indelible 
ink, so that any one egg was never measured twice. In all 203 clutches were 


handled. 
(2) Reduction of the material. 


The principal part of the work of tabling and reduction was carried out by 
Julia Bell*. The characters dealt with were: 


(i) Length of Egg : : ; : : L 
(ii) Breadth of Egg, maximum value. : : B 
(ii) Lateral Girth at section with maximum procaine : Gr 
(iv) Longitudinal Girth . 5 ; : : : : i Gy 
(v) Length-Breadth Index. ; : B/L 
(vi) Mottling, as determined from a Pas of one eggs. M 
(vii) Ground Colour, as determined from a tint scale. ; C 


* The authors have to thank Miss B. M. Cave for certain tables and their correlation coefficients. 
The Editor is responsible for the actual wording of this paper. 
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The Leneth of eze LZ may be considered as the easiest character to determine 
fo) {=} 
and needs no further comment. 


The Breadth of egg B should be closely related to the Lateral Girth G,, and 
in most cases the relationship G,=7B is very closely satisfied. If we sum and 
take the means we have 


a = Mean Lateral Girth/Mean Breadth. 
This gives in the present material : 
mw = 3224 as against 3142, 


which marks an error of about 2°6 wee rather larger than we might anticipate, and 
possibly due to the inclusion of a certain number of slightly damaged eggs, and 
the measurement of the eggs in the field and not in the laboratory. The relation 
between G, and B is a useful test of accuracy and should be determined with a 
slide rule before the egg is finally replaced in the nest, or lost sight of. 


The Longitudinal Girth G; is somewhat more difficult to measure, and a rough 
test of its accuracy not so easy to determine as in the case of G,. We have, how- 
ever, developed a formula for determining Gin terms of B and L, and on testing 
it we find that as a rule the differences are below 15mm. Such a formula may 
be useful as emphasising the need for remeasurements, when the observed and 
calculated girths have values much in excess of 15mm. We are not prepared 
to say, however, that the coefficients in this formula can be extended beyond 
the case of the Common Tern. 


While the Length-Breadth Index is valuable as giving a measure of the 
ellipticity of the egg, it is not of much influence on the apparent oval shape, 
unless we suppose some theoretical geometrical construction for the egg. Hf we 
suppose the blunt end of the egg to be approximately spherical, the hemisphere 
ending with the maximum breadth, then the egg might be considered as divided 
into two portions, the upper or hemispherical with radius $B and the lower with 
length from the base of the hemisphere (or ‘ equator’) to the lower pole = L—4$B. 
The ratio of these two segments of the length depends only on the index B/Z. 
Thus it is conceivable that this index has actually as much association with 
ovality as with ellipticity, although without some geometric theory of egg-shape, 
we are not able to make any dogmatic assertion as to the value of B/Z. It 
seems, however, a character of considerable interest as being free of absolute size 
and also some measure of shape. If J =B/L and O be the ratio of $B to L—4B, 
: B/L 
Le. O= = BIL eit 
have correlated O for eggs of the same clutch as well as J. Of course, since O 1s 
a function of J, there will be relatively little difference in the results. 


=I/(2—JI), we may consider O a measure of the ovality, and we 
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The mottling is a far more difficult matter for determination. The points 
which may be considered are: 


(1) Size and shape of individual splodges. 
(ii) Portion of the egg over which these splodges are distributed. 
(iii) Area of mottled surface as compared with whole area of the egg. 


The fieldworkers selected 9 typical mottlings (see Plate IX) and named these 
a, b,c, d,e, f, g,h, i; they then compared each recorded egg with these and selected 
the letter which marked the egg on the scale most resembling the egg to be 
recorded. There is little doubt that in this manner they divided the whole series 
of eggs into differentiated classes. But it may be doubted whether the judgment 
made depended on one only of the above three characteristics. Hence when we 
came to arrange the eggs a, b, c, d,...h, 7 on a scale of mottling, we found that the 
order would not be the same when we classified in turn by each of the three 
characteristics. We endeavoured to place the eggs in order by extent of mottling, 
ie. by (iii), but we think that the relatively low value of the homotyposis which 
has resulted is possibly due to size and shape of the mottlings, (i), having had 
as much influence on the classification as the extent of area mottling. Even 
position on the egg, (11), can influence judgment considerably. We believe that 
in future work on eggs, it would be desirable to classify the mottling of each 
by using the three characteristics independently. Even then an ocular appre- 
ciation, as this must be, may fail to give a very close measure of the nature of the 
mottling and thus weaken any homotypic correlation. 


The Ground Colour of these eggs varies through all shades of brown to 
brownish greens and blue-greens. The fieldworkers attempted to give the value or 
depth of ground-colour pigmentation without regard to the brown or green shade 
of colouring. The scale of values is given at the foot of Plate VIII. 


A point seemed worth consideration: assuming the pigments to be deposited 
on the egg in its passage through the oviduct, it was conceivable that greater 
pressure might indicate greater intensity of pigmentation. We accordingly 
selected the broader egg in each clutch and investigated for every pair of eggs 
from the same clutch whether the broader or narrower egg had the larger mass 
of mottling and greater density of ground colour. We reached the following 
results : 


The broader egg in every possible clutch-pair has 


Greater mottling in 26 cases | More dense ground colour in 25 cases 
a + « = . 

The same ‘ BY op The same > $ 39 
Less . 40s, | Less dense 5 5 BY a 


Perhaps not very much stress is to be laid on these results, but they suggest 
that the total amount of pigment deposited is less the broader the egg, i.e. for the 
same bird a relatively smaller egg will be more pigmented. A solution of this 
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rather unexpected result may, perhaps, be found in the suggestion that the total 
amount of pigment is the same in both eggs, but the mottling and ground colour 
will appear denser on the smaller surface of the smaller egg. The point deserves 
consideration on the basis of larger numbers and possibly better defined measures 
of pigmentation. 


(3) Means and Variability. 


Table I gives the means, standard deviations and coefficients of variation of 
the several characters studied. It will be seen that the tern’s egg has for 
quantitative characters relatively small variation. The values of the coefficients 


TABLE I. 


Means and Variabilities (Absolute Measurements in Centimetres). 


Character | Mean SU ease 
| Deviation Variation 
| 
Length ZL ae .. | 4:14+:007 180 + -005 4°34 + "12 
Breadth 2B ‘ cog 2°98 + -004 099 + :010 3°33 4°09 
| Girth G... ae ae 11°39 + ‘015 376 + 010 3°30 + °09 
_ Girth G, mbt .. | 9°59+-014 347 + 010 3°62 4°10 
| Index B/L ~ Lad 72°04 4 136 3°449 + 096 [4°79+ °13] 
| Index of Ovality, O* .... | 56°35+°171 4:334+°121 [7°69 + °22] | 


of variation are less than many of those which we find for the human skull 
(3 to 8), but greater than those we know for the wing of the wasp. It is very 
doubtful whether the coefficients of variation of the indices should be included 
in such considerations, for the object of the use of these coefficients is to get 
rid of absolute lengths, and this is already done in the case of indicest. It is 
noteworthy that the length of the egg is only slightly more variable than the 
breadth and the breadth-girth is actually more variable than the length-girth. 


(4) Correlations. 


If we turn to the correlation of characters in the same egg, we note that 
while the ordinary product-moment correlation * has been calculated for all 
measurable pairs of characters, this is not possible for the ground colour or the 
mottling. Where mottling has been used with a quantitative character there 
has been calculated and both corrections used. Where mottling has been con- 
sidered in conjunction with ground colour, there we have adopted: mean square 


contingency correcting for both number of cells and for class-index correlations. 
* O=(B/L)/{2-(B/L)}. 


+ For example, if we take 1/0 for our index of ovality its mean =176-32, the standard deviation 


=11-24 and the coefficient of variation =6°38. Is O or 1/O the more variable? It does not seem that 
the coefficient of variation can help us in such a problem. 
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Certain facts are at once obvious from this Table, others are obscured. In 
the first place length and breadth of the egg of the Common Tern have a rela- 
tively small relationship, while the relationship between the two girths is between 


TABLE II. 


Correlations of Characters in the same Egg. 


Characters | Symbols | Correlation Remarks 
— — = = = ——=— 
Length and Breadth L, B \+:2220+ 0374 = 
Longitudinal and Equatorial Girths | Gy, Gy +°5297 + 0284 = 
Length and Longitudinal Girth ...) Z , G, +°8804+-0088 = 
| Breadth and Longitudinal Girth ...| B, Gy) |+°5216+ :0286 = 
| Index and Longitudinz al Girth B/L, Gy | —'3832 + 0336 — 
Index and Length *: B/L, L |—"7284+°0185 = 
Index and Breadth es BIL, Bo | +5033 + 0294 = 
Mottling and Ground Colour M,C + ‘2260 (corrected C2) | More mottling, deeper ground colour 
Mottling and Index... | M, B/L |}—-1550 (corrected y) | Less mottling, higher index 
Mottling and Breadth | U, B — 1803 (corrected 7) | Less mottling, oreater breadth 
| Ground Colour and Index ... C, BIL ‘0000 (corrected n) | No relationship 
Ground Colour and Breadth | C, B  |—*1506 (corrected n) | Fainter ground colour, greater breadth 


two and three times as great. This probably flows from the consideration that 
the correlation of G, and G, arises from B being a factor in both and only 
secondarily from the correlation between Z and B. The correlation of the 
longitudinal girth with egg length is 60% higher than that of longitudinal 
girth with egg breadth; both these correlations are more substantial than that 
of the longitudinal girth, G,, with the egg index, B/Z. The egg index correlated 


with length is large and negative, and with breadth considerable and _ positive, 


precisely the results we should anticipate would appear if the correlation were 
largely spurious *. 


In order to ascertain how far it was possible to predict the longitudinal girth 
from length and breadth, double (for Z and B) and triple (for Z, B and B/L) 


regression formulae were ened out. The following equations resulted : 


(i) G.— G,=1-2701 (B-— B)+1:6415(L—L), 


or, G, = 12701 B +1°6415 L +8224, 
and = (ii) Gy — G; = — 17-2930 (B — B) + 146374 (L — L) +7636 (I —T), 
or, G, = — 17-2930 B + 146374 L +-7636 B/L — 527239. 


The first seventeen eggs were taken as a random set to test these results upon 
with the following values: 


* As a matter of fact the correlation of index and length for a constant breadth is —-997 and 
of index and breadth for a constant length is +-996 instead of unity. These values indicate how closely 
the linearity of regression holds in these quantitative measurements. 
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TABLE III. 


Observed and Calculated Longitudinal Girths. 


Calculated Girth Ditference | 
Egg | Observed | 
Number | Girth | pmer l : 
Gy (ii) | (i) As | Ay 
ie - = | | =s 
i 11-40 11°14 | 11-20 + 26 +°20 | 
2 11°65 11°83 11°74 — 18 — ‘09 
3 12°10 12°24 12°07 | -— ‘14 } 4°03 | 
y 10-80 11°46 | 10°84 — “66 --04 
5 11°70 Deo ee aS + °47 +°39 | 
6 11-20 We elt — ‘07 —:14 | 
? 12°15 13°19 12:31 — 1°04 —16 | 
8 (i) 11-20 11719 1-27 + 01 — 07 | 
8 (ii) 11°30 11-09 WED =e. 2 +03 
9 (i) 11-50 1144 | 11°61 | + 06 Sil 
9 (it) 11-40 11°36 11°45 + -04 — 05 
10 11°50 11°52 11°61 — 02 —1l 
11 11°80 WES 5 iy ele 72 + 25 08 
12 11-90 11°62 11-74 + 28 +716 | 
13 -(@) 11°10 11°01 10°94 209: |. E16 
13 (ii) 10°80 10°75 10°78 + 05 | +02 
18 (iii) 11°70 1455 |) 11:55 + 25 e155 
e o3 a Root mean 354 146 | 
| square A | 


To judge by this small sample we obtain only increased inaccuracy by taking 
the more complicated formula. We shall only make an error of about 14 mm. if 
we calculate the longitudinal girth from 


G, = 12701 B +1:6415 E +8224, 


and for the egg of the Common Tern at least this is a convenient formula for 
verifying measurements in the field. 


The remaining correlations indicate sensible correlations, but these correlations 
might well be substantially higher had a better scale of mottling been adopted 
ab initio. In the first place we see that the mottling and the ground colour 
are sensibly correlated, and the deeper the ground colour the more intense is 
the mottling*. 


We have already seen (p. 146) that for eggs of the same clutch the broader 
has less intensity of ground colour and more meagre mottling. This is true 
for the eggs of the Common Tern in general, although it is probable that a better 
classification of mottling would bring out more marked correlations. The 


* This might probably be asserted interracially as well as intraracially, compare for example the 
swallow with the skylark, the lapwing with the ringed plover, etc. 
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following are the orders (a) of mottling chosen, (b) of breadth classes, (c) of 
index classes : 


| | 
| (a) () (c) 
| ser sy: . Order of Breadth Order of Index 
——— —_ {ics = —___— _ — 
Class B Class BIL Class 
gte+d a 3°00 a 72°64 
a c 2°99 f+i 72°54 
b gtetd 2°97 C 72°30 
Cc ttt 2°96 gtetd | 72:27 
h h 2°96 h 71°95 
ft+é b 2°95 b 70°54 
= Mean 2°98 Mean 72°30 


The relationship is small, but exists. It seems reasonable to suppose that 
the order of mottling classes as given by B or B/L, where there is only one 
displacement, may be a better one than that we have selected. But if in the 
mottling order b and c were interchanged, it would agree with the B classification, 
in so far that the three classes of least and of most mottling in the two classi- 
fications would be the same. 


We now turn to the ground colour. We see that the ground colour is 
fainter, when the egg has greater breadth, but that there is no relation of the 
index to the intensity of ground colour. The results of p. 147 are thus confirmed 
by the general correlation of ground colour and breadth. Although there is no 
high-correlation, we may assert that it is probable that the intensity of pigment 
dees not depend on the pressure during transit of the oviduct, but rather on 
a constant amount of pigment being distributed over a larger surface. 


(5) Homotyposis in Eggs of the same Clutch. 


The homotyposis, or degree of resemblance in character between eggs of the 
same clutch may be studied on the present material. The chief direct and cross 
homotypic correlations are given in Table IV. 


Pearson has shewn* that the degree of resemblance of undifferentiated ‘like 
organs’ might be expected to be equal to that of pairs of brethren, i.e. about *50, 
and proved that this is so for many homotypes in the vegetable kingdom, a result 
which has been since confirmed by much as yet unpublished material from the 
animal kingdom, including a number of series of birds’ eggs. Thus the mean 
value of the homotyposis for eggs of the Common Tern could hardly be improved 
upon. Only the colour characters show irregularity, especially the mottling, a 


* «On Homotyposis in the Vegetable Kingdom,” Phil. Trans. Vol. 197, A, pp. 285—379, 1900. 
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feature we have already indicated as difficult to measure. It will be seen that 
the correlation of the ground colour of an egg with the mottling of a second 
(3989) has come out greater than the organic correlation between mottling and 
ground colour in the same egg (2260). 


TABLE IV. 


Homotypic Correlations. 


| Symbols | Characters Correlation 
L, L£ | Lengths of Eggs in same clutch ... oe oes su ... | 4643 + 0346 
B, B | Breadths of Eggs in same clutch ... fee bet i ... | 5176+ 0326 
G, G, | Longitudinal Girths of Eggs in same clutch... et ... | 0076 + °0327 
G@,, @, | Equatorial Girths of Eggs in same clutch ae oss ... | 4621 + 0350 | 
: | Mean value ae te 4879 
Direct 
M, M | Mottling of Eggs in same clutch ... mo te “ee ae| °3500 
C, C | Ground colour of Eggs in same clutch ... es one oe “5709 
| Mean of six characters ... isi eis "4788 
| aan | 
L,B | Length of one Egg with Breadth of a second ... ... | 0922+ 0441 | 
C, M Ground colour of one Ege with Mottling of a second ... | 3989 + :0379 
Cross | £, G, | Length of one Egg with “Longitudinal Girth of a second... 4229 + '0362 
B, G | Breadth of one Ege with Longitudinal Girth of a second... | *2530+-0416 | 
1, Gy | Longitudinal Girth of one Egg with Equatorial Girth of a second | +2603 4 -0413 
| B/L, ie Indices of two Eggs of same clutch ie .. | 65374 0308 
Index | 0, Indices of ovality ‘of two Eggs of same clutch ... ee ... | 5527 + °0309 
| 1/0, Ho Inverse of indices of ov ality 5361+ :0317 | 
| | 
| | 
Mean of three Index Correlations —... 5475 
Mean of nine Homotypic Correlations 5017 | 


We feel that the classification by mottling is at present too uncertain, and 
that until the result cited has been confirmed with larger numbers and more 
definite categories, it would be idle to consider whether, while a given bird has 
usually highly or lowly pigmented eggs both as to ground colour and mottling, 
yet when in the individual egg there is an excess of mottling pigment, there 
may be some tendency to a relatively less increase of ground colour. Thus the 
correlation in the individual egg might possibly be less than the correlation 
between eggs of the same clutch. Such considerations must be postponed until 
the fact itself is adequately demonstrated. 
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Another relation suggested by Pearson* is that the cross homotypic corre- 
lation of the characters # and y should on the average equal } (correlation of 
w and w + correlation of y and y) x (the organic correlation of # and y). It is 
clearly impossible from what has just been said to apply this to the cross 
homotyposis of ground colour and mottling. We can apply it to the five cases 
in which quantitative measurements have been made. Table V gives the 
requisite data, the last two columns giving respectively the calculated and 
observed cross correlations. 


TABLE V. 


Cross Homotypic Correlations. 


Characters Direct Correlations ' Cross Correlation 
oe | Organic Correlation | Te 
| eee (1) and (2) | | 
| (1) (2) (1) and (1) | (2) and (2) | | Calculated | Observed 
| 

DL B | +4643 5176 22.20) +1090 0922 

L G, | “4643 OTK | “8804 | °4278 "42299 

ie G, | +5076 “4621 “5297 | +2568 2603 | 

B Gree ee ou7G Oi 5216 | "2674 "2530 | 
| eee 
| Gy 3) BEE “5076 SDDoM — 3832 | — +2083 — 2007 


When we compare the calculated and observed cross correlations, we see 
a striking agreement, or the theory that cross homotyposis is the product of 
direct homotyposis and the organic correlation of the characters under investi- 
gation holds very closely for the egg of the Common Tern. 

The general results obtained are in good accord with those reached by previous 
observers, and the authors hope to investigate one or two doubtful points on 
fuller material this year. 


* Phil. Trans. Vol. 197, A, p. 290. 
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APPENDIX OF CORRELATION TABLE 
TABLE A. Length and Breadth of Egg. 
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TABLE D. 


Breadth of Egg and Girth L. 


Breadth. 
»ni{[slaflsilaltssijafjsiloaltlxsiolwsloalrs 
Ma OU OES He WSO | FO. EO Oa 8S |S lo | | ee 
i) NX X » RW | & NW | & » % | % 9 | Ss) 
| | | | | | | | anlee| | eal | | Totals | 
ip S wy oS iW Ss wD S wD Ss Xe) S wD iS 
VOM Ore || Reale eco |G) | Ose Os |S S| a | ose |e 
RiRIl R{/RIRQIRI RI RI Rl nls {_n]_o] Ss 
10:00—10°09 1 
10°10-—-10:19 O08 
10°20—10°29 2 | 
10-30 —10°39 0 
10°40—10°49 2 3 
10°50—10:°59 | — | — | —| 1 1 - 2 
10°60—10°69 | — | — if eee ee eee | ee ee oe ee 1 
10°70 —10°79 | ee fag Fe = 4 
10°80—10°89 | | 5}. 1 a] |e 3 13 
10:90—=10:99 | —|—|)—| 1)}—|].1] 4| 4] 38] 5 18 
00 —thOO— |e | |=) 1 Ae ee Sale A eee ea Se | ees 16 
neon —— aie | 2 IO A | 4 | oe |) 1 24 
11:20—11'29 | — se Pe Te a eLO! Ch eed 2d 32 
AC eou wee fe eT lO Bee | Be | 4 = 3 
11-40—11-49 | eee ne oaelise| Cree. | Tk |p=—. lea 
11-50—11°59 | Dele eG oe Ob etal Wee ety cae 
11:60—11°69 | eye aliemee ie etal Pret can els aie oh |) 
11-°70—11°'79 | — - eae wo Bal) Vel 2) | 
11-80—11°89 | Aa 28) OC an a5 | al 
11:90—-11:99 Ses Paleese coer Oras | Seb ape 
12:00—12:09 | — | — | — | =i ee been. Sl at Weg i) = 
12:10—12:19 | — | — | | — | 2/—]| 1}/—] 2) 1J— 
12:20—12°29 | | | 1 
12°30 —12°39 =| | | = ae ee 
12:40—12°49 | | | | | — | - i 
pe tn 
i i} 
Heer ete oe a ae 24 | 61 | 61 57 | 35 | 20 | 10} 2 


TABLE E. Girth LE and Index 100 Breadth/Length. 
Girth ZL. 


156 On Homotyposis in Eggs of the Common Tern 


2 
| =) DOARNAMONHMAA tH 
$ AMI O10 N z 
i= 
67.6I—04-6I a 
i 
| 66-61 —08.6T S 
| 66-61 06-61 = 
| i 
61-61—O01-61T No) 
| 60-61 —00-é1 Ne) eS eS 
66-TI—06-TT oS 
| 
yeaa ellie ere 9 
68-1T—08-IT is 3S 
: a x Kt 
6L-TI—-OL-TT sy] oN 
nl ————| 
ee = = 
69-TI—09-T1 | = = 
> ° 
0, TT— Ne. oo = a 
69-LI—0¢-T] <e ma) 
ea ays x 
64-LI—0V-1I g = eal 
| Ce,.7T—ne aA OONHN aa = o> 
65-1 09:0E a S 
a | ee ee = ST 
6%-TI—0@-TI | Aamo a ac a 
4 oD tS ap b 
| S § |¢1-4—o1-4 s) 
6L-TIL—OL-TT a i Ss : = 
Pomerat ae 3 4 —C0.4 5 
: A 60-%—GO. 
| 60-1 00-1 7 cee = 
oy — Tl) SS tn tf —pn.t Re) 
66-01 —06-01 eet io) = LOT ROOT HY 
— 4 ~ = 
a = S 
1 O-—CR.E Ye) 
68-0I—08-01 29 = ee ms a 
1 N te.0—nG.e oO 
62-01 —0L-0T _ 16-806- oy 
, .E—G9.E Ne) 
69-01 —09.01 = = 68-8 98-8 i 
— & is 
.E—09-¢ 10 
| 69-01—09-01 aa a] 18.E—08-E 
: 
; arom 64-E-—-GAE oo) 
| 6-01-—04-01 Po = eee 
ae /.e ay pia = 
6801-08-01 ° ee Bice 
ee 69-2 —O9-¢ = 
66-01 —04-01 a 2 ‘ ? 
[Sue e =n perom a Py Sava (a 
, Pale 19-6€—09-§ °o 
OL-01—OL-0T =) : 
60-01 —00-0T A 
PA BABAARSRAW 
MMBIHRD HNWVIwD 
Mm he 4 
| ~ 
Seeeeseeseese {es 
3S X Sos Xe) S 
Q > & SSSSRRRKKLSVE 


* 
o 
ire) 
fo 
La) 


‘XOpuyTy 


W. Rowan, K. M. Parker anv J. BELL 157 


TABLE G. 


Breadth of Egg and Index 100 Breadth/Length. 
Breadth. 
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Index 100 Breadth/Length and Mottling. 
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Breadth of Egg in Pairs of same Clutch. 
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TABLE R. 
Mottling in Pairs of Eggs of same Clutch. 
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In Tables R—T, the contingency of each cell is given in brackets, 
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TABLE T. 


Ground Colour in Pairs of Eggs of same Clutch. 
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Length Girth of Second Egg. 
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Girth L. in Pairs of same Clutch. 
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Breadth and Girth L. in Pairs of same Clutch. 
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TABLE X. 


Girth L. and Girth B. in Pairs of same Clutch. 
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Index of Second Egg. 
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MISCELLANEA. 


I. The Statistical Study of Dietaries, a reply to 
Professor Karl Pearson. 


By Proressor D. NOEL PATON, F.R.S. 


PROFESSOR PrARSON’S criticism of Miss Lindsay’s Study of the Diets of the Labouring 
Classes in the City of Glasgow (Biometrika, Vol. 1x. Oct. 1913) is a good example of the 
danger of one who does not understand the problems involved and who is ignorant of the 
work already done upon a subject attempting to discredit the results of an investigation by 
the application of mathematics according to his own fancy and in, what seems to me, a 
totally illegitimate manner. 


Not appreciating the questions which were under investigation, he starts his criticism by 
demanding that our studies should afford a solution of problems other than those we had 
before us, and, because he does not find the solution of these problems, he proceeds to abuse 
the work. 


Apparently in his opinion the object of the studies should have been to determine what 
effect the diets which the families were taking at the time of the study had upon the 
physique of the various individuals. He states that, if adequate anthropometric observations 
had been secured in such a study, it would have been at once possible to co-relate these 
with the diets. It is unnecessary to point out, as was pointed out in the Report, that the 
physique is determined by the whole previous condition of life and by the influence of 
heredity, and that it is absurd to attempt to relate it solely to the diet (Report, pp. 3 and 4). 


The objects of the studies are quite clearly stated on p. 4 of the Report: “Do the 
working classes of this city get such a diet as will enable them to develop into strong, 
healthy, energetic men, and, as men, will enable them to do a strenuous day’s work ; or are 
the conditions of the labouring classes such that a suitable diet is not obtainable? Further, 
if a suitable diet is obtainable, and is obtained, is it procured, or can it be procured, at a 
cost low enough to leave a margin sufficient to cover the other necessary expenses of the 
family life, with something over for those pleasures and amenities without which the very 
continuance of life is of doubtful value?” 


It was accepted as proved by previous work that for the labouring classes: “If a family 
diet...... gives a yield of energy of less than 3500 Calories per man per day it is insufficient 
for active work, and if less than 3000 it is quite inadequate for the proper maintenance of 
growth and normal activity.” 


The first question investigated was: “Did the families examined receive this supply of 
energy?” As regards the poorest classes this was answered in the negative. The validity of 
this conclusion has not been challenged by Professor Pearson, 
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The second question considered was whether the diets contained a sufficient supply of 
protein. Previous work indicates that this is probably something above 110 grms. per man 
per diem. It was shown that in families with regular incomes of over 20s. a week the 
average protein intake was above 110 grms., and that in families with regular incomes and 
in those with irregular incomes of under 20s. a week the average protein intake was under 
110 grms. This conclusion has not been refuted. 


Accepting our premises, the final conclusion was (p. 27) “that while the labouring classes 
with a regular income of over 20s. a week generally manage to secure a diet approaching the 
proper standard for active life, those with a smaller income and those with an irregular 
income entirely fail to get a supply of food sufficient for the proper development and growth 
of the body and for the maintenance of the capacity for active work.” 


The main points proposed for the study were thus elucidated. : 


The part of the Report to which Professor Pearson specially directs his criticism is not 
the main problem, but that dealt with on pp. 30 and 31—The Physique of Children in 
Relationship to Diet, a subject taken up at the suggestion of Dr Chalmers. Professor Pearson, 
having declared the data totally insufficient, proceeds to apply his statistical methods not to 
refute Miss Lindsay’s conclusion, but to demolish other conclusions upon the relationship of 
physique to income which were never deduced by us. 


The very guarded conclusion in the Report was: “These show very markedly the relation- 
ship between the physique and the food. When the weight is much below the average for that 
age almost without exception the diet is inadequate.” 


Weights alone were considered. Thirty-six children, boys and girls, were dealt with. As 
the relationship of weight to income was not under consideration, they were classified not 
according to the income but according to the energy value of the family diet. Hence 
Professor Pearson’s remarks upon this point are quite beside the mark. 


I give below, in a re-arranged form, the Table from Appendix IV. The individuals are 
placed in two groups according to the energy value of their diets, with, opposite each child, 
the average weight for the age, taken from the Report of the Anthropometric Committee 
published in the Transactions of the British Association for the Advancement of Science, 
1883, and with the difference between the weight of the child and the average weight. The 
differences between groups 1 and 2 are sufficiently marked and warrant the conclusion as 
stated above. 


That is, of the children in families the diets of which yielded more than 3000 Calories per 
man per day: 
10 were above the standard or not more than 5 lbs. below it, 
8 were more than 5 lbs. below it, 


while of the children in families in which the diet yielded less than 3000 Calories 


3 were above the standard or not more than 5 lbs. below it, 
15 were more than 5 lbs. below it. 


It must be remembered that the ‘standard’ is for the children of all classes and not for 
those of the poorer classes. 


The fact that the average age of the children in the second group was about 1? years 
greater than that of the children in the first group does not account for the marked 
difference. 


The last question which Miss Lindsay had to consider was, how the necessary supply of 
energy and of protein might be supplied without increased expenditure, and she was right in 
stating that these can be more cheaply purchased in vegetable than in animal foods. She 
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TABLE A. 


Family Diets above 3000 Calories per Man per Day. 


_. | Age in Weight | enenray| 
Number | Calories ve ; Sex nee OA Weight | Difference 
ears in lbs. . 
in lbs. 

2 4003 a f°) 39 46°7 - 77 
2 4003 10 3 63 67°5 — 45 
2 4003 8 3 50 54:9 — 49 
2 4003 5 3 35 39°9 -— 49 
36 4091 3°25 3 35 5450 0) 
4 3882 8 2) 45 52°2 | — 72 
32 3822 6°25 2 39 42°4 — 34 
4 3882 6 ie) 39 42°4 - 34 
4 3882 10 3} 56 67°5 -11°5 
39 3422 10°5 2 55 65 —10°0 
50 3471 6°25 io) 37 42-4 | — 54 
50 3215 6 9) 47 42-4 + 46 
BS 3116 6 2 43 42-4 + 06 
18 3248 55 @ 43 41-0 ap 2450) 
54 | 3282 5 Q 33 39°6 — 66 
58 | 3080 6 3 38 44-4 | — 6-4 
30* | 3136 55 4 21 4] | ~—20-0 
49 | — 3841 55 3 42 41 + 1 


* Family with rickets. 


TABLE B. 


Family Diets below 3000 Calories per Man per Day. 


. A | 3 Standard 
Number Calories ' Age . Sex Weight Weight | Ditference 
in Diet in years in Ibs. nba: 

d4 2690 13 ie) 76 87 —11°0 
14 2690 12 Q 60 76°4 —16°4 
14 2690 10 ie) 45°5 62:0 —17°5 
US 2936 10 o 56 62:0 — 6:0 
7 2931 9°75 io) 44 62°0 —18°0 
D0 2686 5°75 2 42 42°4 — o4 
14 2690 9 3 45 60°4 — 15-4 
4l 2723 6°75 3 53 49°77 + 3:3 
14 2690 6 3 36 44-4 — 84 
57 2974 5 3 37 39°9 — 29 

3 2891 5 rey 37 39°9 — 29 

2 2772 5eD 2 34 41°0 — 7:0 
24 2412 eT: 2 39 68°1 —29°1 
21 2329 9 g 37°5 55°5 —18°0 
24 2412 6 g 28 42°4 —14°4 
21 2329 1 3 60 72°0 —12°0 
10 2435 8 3 43 54°9 —11°9 
59 1978 5 4 26 39°9 —13°9 
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undoubtedly starts with the well-known conclusion that a Calorie in the food absorbed in a 
mixed diet from whatever source, protein, fat or carbohydrate is of equal dynamic value. 
Previous work amply justifies this. 

She was not foolish enough to attempt to draw any conclusion from her investigations as 
to the relative value of animal and vegetable food in the diets on the physical development 
of the individuals. 


Professor Pearson seems entirely unable to grasp the fundamental fact that the physical 
development of the individual depends largely upon his past conditions of life. To co-relate 
it with the special constituents of the food which he habitually eats will require not only an 
enormous series of studies, but a full investigation of the character of the various food stuffs 
and of the mode of cooking. 


These points I tried to explain to him when I wrote to him in summer. He did not 
write to me as, in his criticism, he says he did. Miss Lindsay forwarded to me a letter 
from him to her, and I wrote a reply to Professor Pearson which he did not acknowledge. 

In conclusion I would say that before he expects his criticism of a physiological problem 
to be taken seriously, he had better make some attempt to understand the nature of the 
problem. Certainly it is not my intention to waste time in replying further to his criticism 
unless in the future it is more pertinent than is his present contribution. 


II. The Statistical Study of Dietaries. A Rejoinder. 


By KARL PEARSON, F.R.S. 


I puBLISH Professor Noel Paton’s reply because it is very typical of the type of difficulty 
which we meet with at present, when we assert that what is really statistical work must be 
undertaken only by the adequately trained statistician and that when it is not, then the 
investigation cannot be considered as falling into the field of science. 


Professor Paton states that the following question given on p. 4 of the Report formulated its 
object: “Do the working classes of this city get such a diet as will enable them to develop into 
strong, healthy, energetic men, and as men, will enable them to do a strenuous day’s work; or 
are the conditions of the labouring classes such that a suitable diet is not obtainable?”... 


Now Professor Paton either assumes that the sample taken of the diet of the individual 
family was their customary diet, or he does not. If he does, then the question: Was the diet 
such ‘as would enable the working classes “to develop into strong, healthy, energetic men”? 
has meaning. If he does not, not only is it idle, but the section dealing with the physique of 
the children on the basis of a sample diet taken as a rule for a week (occasionally for a fortnight), 
is beside the point. 


But anyhow, I ask how he can possibly ascertain how the working classes will “develop into 
strong, healthy, energetic men,” if he does not take an adequate anthropometric survey of the 
families subjected to the dietaries recorded? He says that it is accepted and proved that “ If 
a family diet...gives a yield of energy of less than 3500 calories per man per day it is insufficient 
for active work ; and if less than 3000, it is quite inadequate for the proper maintenance of 
growth and normal activity.” He further assumes with Miss Lindsay that calories from animal 
and vegetable foods have equal “dynamic value.” I assert that neither of these conclusions, 
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which he accepts, are based on adequate research and they are in fact refuted by Miss Lindsay’s 
own material. For, if it can be shown that animal and vegetable calories have different results 
on the physical development of the children, it is clear that the first statement as to how many 
calories are needful for the proper maintenance of growth has no significance until a statement 
is made with regard to the source of the calories. Professor Paton cites no evidence for his 
statements; from what I have read on the subject of calories, I feel convinced that most 
of the data on the matter would not stand for five minutes any adequate statistical analysis. 
The Report, Professor Paton tells us, shows “very markedly the relationship between the 
physique and the food.” Yet in a previous paragraph he says ‘that the physique is determined 
by the whole previous condition of life and by the influence of heredity, and that it is absurd to 
attempt to relate it solely to the diet.” 


Now the only way to ascertain whether there was a marked relationship between the food 
and the physique of the children was to correlate the two for a constant age and investigate 
whether the correlations were such, having regard to their probable errors, that they could be 
considered significant. I did this with the result that the total calories in the food and the 
girls’ weight for constant age was not definitely significant with regard to the probable error, 
while in the case of the boys the probable error was so large that it was impossible to say 
whether the relationship was really considerable or not. In fact no marked relationship could 
be deduced from Miss Lindsay’s data, they were too inadequate. If Professor Paton’s statement 
as to the influence of heredity is to be trusted, then even my correction for age was inadequate, 
and the data ought to be corrected also for physique of parent! If so, why was the parent not 
measured ? 

Professor Paton places before the readers of Biometrika two tables on which this “marked” 
relationship is asserted by him to rest. One of the cases in his Table A, No. 32, is erroneously 
placed in this table; the details show that the number of calories was 2949 and not 3822* ; 
it should be in Table B. These tables contain 16 boys’ weights and 20 girls’ weights. Professor 
Paton takes the British Association measurements, which are, of course, wholly inadequate as a 
test of Glasgow children, and making no real correction for age+ considers whether the children 
in the two tables were or were not above the quite arbitrary limit of 5 lbs. below standard. He 
gives us no measure at all of the significance of the result, which is based on the vagaries of 
sampling 16 boys of ages from 3 to 11, and 20 girls from 5 to 13; and he supposes in some way 
that this treatment can possibly refute the correlation coefficient, wo,» Of weight and food 
calories for constant age with its probable error! I can, however, throw more light on the 
matter. Owing to the great courtesy of Dr Chalmers, Medical Officer of Health for Glasgow, 
I have been able to more than treble the number of weights of the boys and girls subjected 
to the dietaries. The results for total calories in food, C;, now aret: 

Girls, 69 Boys, 55 
alec, = +21 £08, al uc, = +05 +09. 
Thus the relation for boys is now quite insignificant, and for girls may well be insignificant 
also. At any rate although both correlations are positive, there is no “marked” relationship 
between the physique and the dietary. Of course, it may be said that these weights (w) have been 
taken at some interval after the dietaries were recorded, but unless we assume the dietary to be 
a rough measure of the permanent feeding of the family, whose physique has been gradually 
built up for years before the dietaries were recorded, the observations must be discarded as of no 
value at all for testing physique, or as Professor Paton phrases it “development.” 


* In the Appendix V of Rickety Families, it is given again ; this time as 2329 calories, 

+ The deviation at each age would have to be measured in terms of the standard-deviation of weight 
at that age; naturally the deviations are larger for older children. 

+ I have to thank Miss B, M. Cave for the present series of correlations, 
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But the most interesting point ascertained from the new material is the confirmation of the 
result that the higher the proportion of animal to vegetable calories the greater the weight. In 
Biometrika, Vol. 1x, p. 533, we had for 16 boys and 20 girls: 


Boys: alws Cy/C4= — 23416, 

Girls : alws Cy/C_= — 12°15. 
We now have for 55 boys and 69 girls: 

Boys: alws Cp/C4= ~ 380+ 08, 

Girls: ailing, Cy [C4 =— — 94. + ‘08. 


These results seem to indicate that Miss Lindsay and Professor Paton, who supports her view, 
are in error when they consider a calory the same whether it be from animal or vegetable food. 
On the other hand, our larger numbers now indicate that : 


(i) For a constant age the expenditure on vegetable or on animal food has no sensible relation 
to weight. 


(ii) For a constant age the number of calories in vegetable food has no sensible relation to 
weight. 


(iii) For a constant age the number of calories in animal food has a positive correlation with 
weight for both girls and boys, being definitely significant in the first case (+°32 4°07) and not 
so in the second (+08 + ‘09). 

(iv) For a constant age the correlations of weight with ratio of expenditure on vegetable and 
animal foods are for both boys and girls quite insignificant as compared with their probable 
errors. 

I am extremely obliged to Dr Chalmers for doing his best to supply additional material. As 
far as it goes, it tends to show that calories are of far more importance than expenditures, but 
that calories from animal food are more closely related to physique than are calories from 
vegetable food*. The new material supports my criticisms that the failure to distinguish 
between animal and vegetable calories stultitied the advice given by Miss Lindsay, i.e. to spend 
money on oatmeal rather than on eggs. It also indicates that no safe conclusions with regard 
to dietaries can be drawn until a reasonable anthropometric survey accompanies the record 
of dietaries, and the whole is reduced with adequate statistical knowledge. 


One point I can allow Professor Paton. It was an oversight on my part, when I said that 
I had written to both Miss Lindsay and to himself; the letters in which Miss Lindsay and he 
stated that to follow up the families now would be impossible were both replies to one and the 
same letter of mine addressed to Miss Lindsay. The additional facts I desired were in their 
opinion unascertainable, and further correspondence did not seem to me likely to be of any 
service in achieving the end I had in view, namely to render of real service to science a piece of 
recording work from which in my opinion then and in my opinion still, very misleading conclu- 
sions had been drawn, and which conclusions in their turn had been exaggerated in the press 
résumés of the paper. I do not think any such work as that done on dietaries by Miss Lindsay 
and Professor Noel Paton will be of real value until (i) these dietaries are accompanied by 
a thorough anthropometric survey of the whole families of the dieted and (ii) the equality of 
animal and vegetable food calories ceases to be considered as a dogmatic truth. 


* Of course the results show that on such data as are available, the food has relatively little relation 
to the weight, there is no ‘‘marked ” relationship. 
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III. Note on the essential Conditions that a Population breeding at 
random should be in a Stable State. 


By K. PEARSON, F.R.S. 


Let us deal with bi-parental inheritance in the first place. Let « be a character in the father, 
mean 2, standard deviation o,; let y be the same character in the mother, 7 its mean, and gy» its 
standard deviation. Let z be the character in offspring of one sex, o3 be the standard deviation of 
all offspring of this sex and Z the mean. Let: pu’, jus’, pag’ 3 oe” os”) pea” 3 aN poe!” pag’, pa’”, be the 
moment coefficients about the means respectively of father, mother and offspring frequency distri- 
butions. Let 7,, be the mean of the offspring of those parents, who have characters w and y, and 
let the array of frequency of such offspring be given by fs (wv) du about 2,,, i.e. the character of any 
offspring in this array is 7,,+4, where uw is independent of the parental characters x and y, but 
Z,, is a function of w and y the parental characters. Some writers have suggested that the 
offspring character should be taken as a blend of the parental characters, i.e. 


z=}(et+y), 
understanding by blend the mean of the parental characters. This appears to be very unsatis- 
factory for: 


(a) It supposes the parental characters to fix absolutely the offspring characters which is far 
from a result of experience. 


(6) It supposes the mother to reproduce the female size of character in the male and the 
female offspring alike, whereas she contributes to each the sex character of her own stock, ice. if 
she is a tall woman, she would contribute absolutely more to a son than to a daughter. The late 
Sir Francis Galton got over this difficulty by “reducing female measures to their male equiva- 
lents.” This he did by altering absolute measurements in the ratio of male to female mean 
measurements. Thus he would take for the mean of his array of offspring 


a ih oa @ 
2ay =F ange 


if he were dealing with male offspring. A more reasonable hypothesis is to assume that 


This will practically agree with Sir Francis’s form, if the coefficients of variation in the two sexes 
are the same, i.e. 01/%=09/Y. 


If we measure wv from the mean of the array of offspring we have 


ee eyed, Ls 
Z=503 @ + a) etuehelesmmteh alo stasntiasielastioe seraranetnarins arene eb (11). 
We shall now suppose the offspring to follow the law (i), or 
poet = +4 =") 4 ee Gan 
1 2 


where w and y are uncorrelated (mating at random), and w represents other influences than the 
parental, and is therefore uncorrelated with # and y*. The frequency distributions of # and y 


* This assumes the homoscedasticity of the arrays of offspring due to pairs of fathers and mothers 
with characters « and y, 
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may be taken as given by fi (v—2#) and fy(y—y). Let V,x WV, be the total number of possible 
matings ad w 
=SJi(%-2)foy-Z) dedy 
and the total number of offspring V3 in any array 
=f fs; (u) du. 


I now propose to give the expression for the zth moment coefficient about the mean, i.e. py”, 
of the population of offspring of a given sex. We have 


NV, x Nox N3x on”= |] [2 o3” co 2 =!) + uf ve (w— 2x) fo (y—Y) x fz (u) dadydu, 


the integration being extended over the whole of the frequency distributions of father, mother 
and oftspring. Thus 


t=n=s Jn—s ‘L(y —%)}n-8— BOON us 
pal” aa Hos res = = 8 iB feet ea oy" 8- tot os 


x fi (v7 — 2) fo (y— ¥) fs (u) dedy du. 


Now «, y and wu being independent we have 


1 = ‘ = 
iA | (w- EB) a (x az @) Au = pn —3- t 


: Y ; ” ” 
al Yy-¥)' hly-y) dy= pi 
: 8 fi lv 
N; | us fs (uw) du=pyg 
= g t=n— n-s 4 
is fey = 03" i. - 3 + | Pin-s—t Ht bet }] eer sonaes (iv). 


s=0 gr-3 |n—s|s t=0 jn- s—t|toy"- St gat os8 


Thus we reach, remembering that py! = py" =p," =0, 


i. 1 at bs ” 
Pe 4 om (, + Hs, + any Synveraisiesayefolstaseleleqelstctouncajatesalerecciotetenttats(s(e{etetctahsfa¥uialetstetalstetstetatetetl Rete ieeis (Vv), 
Hs = 3 o3° (# aF e) ipeall “ad aantearearoealemenitete te coduchiae ua ee ss: ete ae meee eee eee (v1), 
Hl d 2 is 5 if Des. ! 
Ha = 76 os! (A+0% et) +5 oy (PE +8 ) Hal ee oh heNeeeeere (vii) 


But po’ =077, pe” =o", and py” =o". Hence we must have 
pl? =4o52 a a'o{bisieiolelatafe, a¥a\s otesa(e ajalejs\etarele(aoleleiate,sie7= slelelnfsle(erafatsl stale (vill). 


If as usual we take 8, = p37/u.? and B,=4/p? we find from (vi) and (vii), writing s?= po'¥ 


/B rl ) {Va —3 Ve, ne Seen ERR acke ta inaah is (ix), 
4 
Byit = {2 uy = 74 (8! 4+ 8,” +6) — 32} REE ERR EES ER 0c (x) 
Whence by the use of (viii) 
/By¥ =2,/2 va” -5 war-+V,} ve ae ted! ik eae (xi), 


: Jer ac 
gVv=4 {182” — 16 (Bo + By! + 6) SIO. ai sosine satin ss heater cee (xii). 
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Hence in order that the offspring population should be stable, it is needful that in the array of 
offspring for given parents : 


1 
(a) FTI) 03. 
(6) Nama, (Bi - 5 (B+ /B7)| =2y2/8,” (1-7) = 3 Wa” 


if 8,” =8,/=B,", i.e. the skewness be the same for fathers, mothers and offspring. 
i 1 mr 
B=; (78)" —15), 
if pee = Bo! =f," 
Thus, we have for the array of offspring of given parents 


1 
i op 03 


enor ae 
B= 5B, tones outta sae esac e caaioe ees ceavenges (xili). 


Peal, 
Be” —8= 5 (Br — 3) 


Accordingly the variability of the array is less than that of the population of offspring ; and 
the array (unless B,;'”=0, 8;’”=3) is more skew and has greater kurtosis than the general 
population. 

If 712, 723, 73, be the three correlations of father, mother and offspring we know that the mean 
standard-deviation of the offspring of arrays having the same parents is 


we 1 = 743 — 793" — 149? + 2712713731 
s'=05 tony cy Cee 
AY 


and this equals if there be no assortative mating 
ae epee 
(712=0), o3N 1 143? — 193°. 


If we could assume this equal to s we must have, since 


y 1 
=—- OG: 
a/ 2 39 
1 ls prepa aa S 
2 =N1—- 713° — 193"5 
2 
leading to 1132 +723°=4, 


or if the two parental correlations are equal to 
113 =123 = "9. 
In other words, if the parental influences were equal and there were no assortative mating 
and the character in the array of offspring had the mean value 


then the population could only be stable if 
713 =723=0°5. 
But this apparently noteworthy result only begs the question. By the general theory of 
correlation the mean of the array of offspring is 


ame 713712723 U— & 4 V3 —N12713 YY 
Lis? Vox 1-732 oy)” 


if there be no assortative mating, 


Cn? z 1138, 1938 
03 (rm te ros 3 = 103 (= ae 
oj 2 C1 a2 


Biometrika x 23 
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Hence if we asswme the mean of array of offspring to be given by 


VW y 
aa (Z +2) 


(i) the second portion of the expression must be zero, i.e. mean of whole population of 
offspring must coincide with mean of array of offspring where parents have the mean values and 
(ii) we must have 73=723=%. In other words the form of our assumption involves both the 
equal influence of the parents and the value of the parental correlation. 


From the standpoint of heredity no such assumption is legitimate. Neither in Mendelian 
theory nor in biometric formula, nor again in actual observation is it permissible to suppose that 
the mean of the array of offspring is determined solely by the parents. Still less is it possible 
to suppose the actual character of the offspring to be the mean of that of the parents (i.e. put 
u=0). Ifit were we should have z=4(a+y), whence flow 


mr 1 ’ ” 
(Doan (12 + #2") 


m” | , ” 1 
Bs = (3 + ps3 ) tie e/6iaiulole ae elnie)sieje eisieieis.eiais/s sie jslelelelsisiainre (xiv). 


i 1 , , ” ” 
ba 76 (ua + Bpy me’ +4") 

But these equations assume that py, w3'Y and py” are all zero—an absurdity in itself and 
contrary to all experience, whether biometric or Mendelian. For non-assortative mating and 
equal potency of parents, they lead to parental correlations of the order °7 and to an impossibility 
of stability in any population*. 


In fact any such relations as (xiv) are inconceivable on the basis of both biometric as well as 
Mendelian theory and observation. Parental correlations have never been observed anywhere 
near such a value as 0°7. Equations (xiii) are, however, suggestive ; they show that if the 
parental distribution be symmetrical and mesokurtic, the array of offspring will remain so after 
selection; but if the parental distribution does not possess these characters, then any selection of 
individual parents will emphasize the asymmetry and the kurtosis in the resulting array of 
offspring ; or continued selection of this type will lead to greater and greater divergence from the 
normal or Gaussian frequency distribution. 


* If we assume that the mean of the array of offspring of parents of characters x and y is given by 


lx +my, it is only another way of asserting that the regression is linear and that 
_ 712 — 713723 3 _ 713 — 712723 63 


i n= 
1-137 oy’ 


1-193? 9° 


If we make /=m, or give equal weight to the parents, it is only rational to suppose that o;=c2 and 
T12=113, Which lead us to 


N16 O° 
=m=—2 3, 
1+ T93 01 
‘ : ‘12 93 
Hence the mean of the array is —— —(#+y), 
1+793 04 : 


and whether we make x constant and y constant or x+y constant leads to precisely the same variability 


in the array, i.e. 
: 1 = 19? — 1137 — 17932 + 27121713723 Qryo2 
s$=03 > ~=03 1-_—_, 
1 - 723" 1+793 


If assortative mating be zero, this equals 


03 VI = Qo? 
and, if to reach the results for u’” given above we put this zero, we must have 


r= N50=0°7 nearly. 
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IV. The Hlimination of Spurious Correlation due to position in Time 
or Space. 


By “STUDENT.” 


In the Journal of the Royal Statistical Society for 1905*, p. 696, appeared a paper by 
R. H. Hooker giving a method of determining the correlation of variations from the “in- 
stantaneous mean” by correlating corresponding differences between successive values. This 
method was invented to deal with the many statistics which give the successive annual values 
of vital or commercial variables; these values are generally subject to large secular variations, 
sometimes periodic, sometimes uniform, sometimes accelerated, which would lead to altogether 
misleading values were the correlation to be taken between the figures as they stand. 

Since Mr Hooker published his paper, the method has been in constant use among those who 
have to deal statistically with economic or social problems, and helps to show whether, for 
example, there really 7s a close connection between the female cancer death rate and the quantity 
of imported apples consumed per head! 

Prof. Pearson, however, has pointed out to me that the method is only valid when the 
connection between the variables and time is linear, and the following note is an effort to extend 
Mr Hooker’s method so as to make it applicable in a rather more general way. 

If 21, @, #3, ete., W715 Y2, ¥3, etc., be corresponding values of the variables 7 and y, then if 
X, Ly, Xz, ete, Y1, Yo, Y3, etc. are randomly distributed in time and space, it is easy to show that 
the correlation between the corresponding th differences is the same as that between « and y. 

Let ,,D, be the nth difference. 

For 1D, = 21 — Vo, reer 1D,? = ar? = 24 Wy +209, 

Summing for all values and dividing by V and remembering that since 7, and «2 are mutually 
random S (2, #2) =0, we gett 


Again, Dy= I - Yoyo Dar Dy = "19, — L211 — M1 Yot L272. 
Summing for all values and dividing by NV, and remembering that 7, and y. and a, and y, are 
mutually random 
FDigDy Dp {Dy 2ey Fx Fy 


peie/ Tenis 
DD, ey, 


Proceeding successivel ? =? SNe SSAA oe Saisncuieeecnice Bata eters asia tons ID 
2 y ne Dy, rest OF n-1Dy ad ( ) 


“rie 
Now suppose 2), 22, #3, etc. are not random in space or time; the problems arising from 
correlation due to successive positions in space are exactly similar to those due to successive 
occurrence in time, but as they are to some extent complicated by the second dimension, it is 
perhaps simpler to consider correlation due to time. 
Suppose then v= X,4+bt,+ct?+dtitete., «,= Vo+bto+cty?+dt3+ete, 
where X,, X,, etc. are independent of time and ¢,, ¢,, t3 are successive values of time, so that 
t, —tp_4= 7, and suppose y,= Y,+ 04, +¢t,?+ ete. as before. 


* The method had been used by Miss Cave in Proc. Roy. Soc. Vol. Lxxiv. pp. 407 et seq. that is in 
1904, but being used incidentally in the course of a paper it attracted less attention than Hooker’s 
paper which was devoted to describing the method. The papers were no doubt quite independent. 

+ The assumption made is that n is sufficiently large to justify the relations 

S4"71 (x) /(n — 1) = Sy” (x) (mn — 1) = Sy" (x)/n_ and S{"—! (a?)/(n — 1) = Sy” (x?) /(n — 1) = Sy" (x?) /n, 


being taken to hold. 
232 
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Then 1D, = Dy -bT — eT (t, + te) — dT (t?7 + tte 4%”) — ete. 
1D, =,Dx—{bT+cT? +dT3 + ete.} — & {2¢7'+3dT? + 4eT + ete.} 
—t,° {3dT + 6eT? + etc.} — ete. 
In this series the coefficients of ¢,, t2, etc. are all constants and the highest power of 4, is one 
lower than before, so that by repeating the process again and again we can eliminate ¢ from the 
variable on the right-hand side, provided of course that the series ends at some power of ¢. 
When this has been done, we get 
nDx=nDy +a constant, 
nDy=Dy +a constant, 
BO Hei apy = Ds »D % => lxy) 


‘ 20) wd . =r Pn oo » (0) 5 rayle Ss 7 
and of course PD ety =e UD DY for ,D, and ,D, are now random variables independent 


zn 
of time. 

Hence if we wish to eliminate variability due to position in time or space and to determine 
whether there is any correlation between the residual variations, all that has to be done is to 
correlate the Ist, 2nd, 3rd...2th differences between successive values of our variable with the 
Ist, 2nd, 3rd...xth differences between successive values of the other variable. When the cor- 
relation between the two rth differences is equal to that between the two (n+1)th differences, 
this value gives the correlation required. 

This process is tedious in the extreme, but that it may sometimes be necessary is illustrated 
by the following examples: the figures from which the first two are taken were very kindly 
supplied to me by Mr E. G. Peake, who had been using them in preparing his paper “The 
Application of the Statistical Method to the Bankers’ Problem” in The Bankers’ Magazine (July— 
August, 1912). The material for the next is taken from a paper in The Journal of Agricultural 
Science by Hall and Mercer, on the error of field trials, and are the yields of wheat and straw on 
500 345 acre plots into which an acre of wheat was divided at harvest. The remainder are from 
the three Registrar-Generals’ returns. 


I Il Ill IV V VI 
Correlation between ... Sauerbeck’s | Marriage} Yield of Tuberculosis Death Rate. | 
Index numbers. Rate Grain 
Infantile Mortality 
and wits ... | Bankers’ Clear- | Wages | Yield of 5 
ing House Straw | — 
ULI ees Ireland England | Scotland 
head 
Raw figures ss — °33 — 52 +°753 3 "35 +02 
First difference... +°51 + ‘67 +590 +°75 + °69 +°51 
Second difference ... + °30 +°58 + °539 74 ‘74 +°65 
Third difference ... + :07 + °52 +°530 — — — 
Fourth difference ... +11 + 55 + °524 — — —_ 
Fifth difference ... + 05 + °58 — — — — 
Sixth difference... — +°55 = = — — 
Number of cases | 41 years 57 years Ly 42 years 
ae eed years! plots Sa) ae 


The difference between I and II is very marked, and would seem to indicate that the causal 
connection between index numbers and Bankers’ clearing house rates is not altogether of the 
same kind as that between marriage rate and wages, though all four variables are commonly 
taken as indications of the short period trade wave. 1 had hoped to investigate this subject 
more thoroughly before publishing this note, but lack of time has made this impossible. 
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V. On certain Errors with regard to Multiple Correlation occasionally 
made by those who have not adequately studied this Subject. 


By KARL PEARSON, F.R.S. 


(1) Iv is well-known* that if we endeavour to predict the value of a variate xv) from a 
correlated variates 21, 2, ... Y,, by determining a linear function of #7, #2, ... v, which has 
the maximum correlation #, with wz, then the value of #,,? is given by 

: 
‘ h,?=1—A/Ago, 
where A is the determinant 


— 1 5) TOL) M025 eee Ton 


Pity 0 5) Ry oon Pate | 


| Troy Tnty nds oe 1 | 
and A,, is the minor corresponding to the constituent of the pth column and gth row. 


The system I propose to consider is that in which all correlations like 7, are equal, whatever 
p be, toa constant p, and all correlations 7,,,, where p and g may take any values from 1 to n, 
are the same and equal to «. We now have for the value of A the expression 


Wl Pye (Ps) ees Ps | 


| 
To evaluate this determinant add all the rows but the first together, giving 
nap, l+(n—l)e, 1+(n—l)e, ... 14+(m—-l)6, 


multiply the result by p/(1+(2—-1) €) and subtract from the first row. We have 


np* ; | np? 

= —— OFF Oe OF — aa 0. 

l+(n—le’ ’ , | {1 11s x Aon 
Ps l, ¢, || 
Ps €, 1, € | 
BNajoidie sraseuiscia gee aneicstesejmee eelicow camer \ 
Ps €, €, 1 | 

ane 2 
Hence fe a (he = i a i 
1c n ( T+(n—l)e AGEING Too eer (a), 


/ n - 
or R,t=p Tes (eat hPa wetiecees (ii). 


* Biometrika, Vol. vit. p. 439. 
+ The sign of R, must be determined from other considerations. 
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Thus if 7 variates are equally correlated (e) among themselves, and equally correlated (p) with 
another variable, we shall not indefinitely increase the accuracy with which the last variable will 
be predicted from the others by increasing indefinitely the number of the variates 2. 


illustration. The coefficient of multiple correlation is required as we increase the number of 
brothers from whom a prediction of a character in a given brother is made. The fraternal 
correlation =°5, 


Number of Brothers R, 
1 “5000 
2 “5774 
3 “6124 
4 “6325 
5 6455 
6 6547 
10 ‘6742 
a “7071 


Compare against these results ¢wo parents only in a population where there is no assortative 
mating and the parental correlation="5. Here e=0, p="5 and n=2, .-. R=4$,/2=°7071, or two 
parents will give more information than 10 brothers and sisters, and as much in fact as an 
indefinite number. Suppose the parents tend to select their like, i.e. suppose there is assor- 
tative mating in the population, say, e=+15, then with the same intensity of parental correlation 


2 = 6594, 
or, two parents will give us more information than six brothers and sisters. 


Now this illustration brings out the real nature of the effect of increasing the number of 
variables from which we predict. Such increase has very little value, if those variables are 
fairly highly correlated with each other. To be effective they must be highly correlated with 
the variate we wish to predict and correlated very slightly with each other. 


Even in this case there is a limit to the degree of correlation reached when the number of 
variates is indefinitely increased, namely p/,/e, and it is clear that if p be small and e fairly large, 
no very great increase of correlation is obtained if we use an indefinitely great number of variates. 
For example if p='05 and «=°5, we find &, =-0707 only. Even if p were ‘10, we should only 
raise R to ‘1414, could we predict from an indefinitely large number of such correlated variates*. 
Indeed as long as ¢ is not less than p we gain singularly little by combining large numbers of 
variates, For example if p were ‘4, and e=°4 ten such variates would only raise the correlation to 
5898, and an indefinitely large number to ‘6325, which is less than double the single correlation. 
Yet there are apparently many persons who believe that by taking a number of low correlations, 
a high relationship can be reached ! 

Actually there is a limit.to what relations can possibly exist between a variate xv and a series 
of equally correlated variables x, ... 7,. Since & must be less than unity, we have 


py aes 
PON) SV Gielen ats 


2 
np* —1 
or e> f ° 
n—-1 
Thus if 72=10 and p=‘5, « must be >'1667. Or, it would be impossible for 10 variates to 
have a correlation ‘5 with another variable, and a zero correlation with each other. 


* Even if p were ‘10 and ¢ as low as ‘10 we should not raise R for endless variates of this order of 
correlation above -3163, while from compounding ten such variates we should only obtain a correlation 
about double that of a single variate, i.e. R=+2294, 
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If we suppose a number of variates » to be uncorrelated with each other, but correlated 
115 102) «++ “on With another variable w), then we have from the determinant as given below 


2 cee + 2 
==i(l » Tot» To2> e%* Von =(1 — 749 — lo Sesicsaee UI ) Dou. 


iy bes 0) 5 oon 0) 


eevee eee eee eee) 


, ‘ 5 9 
eae: he= More+ Poy? + seeita Tony 
or R=Jn ih Myo tit FT on" 


Therefore, if 2 variables, uncorrelated among themselves, be correlated with an additional 
variable, it is necessary that the root mean square of their correlations should be less than 


55 We see therefore that it must either be impossible to find a large number of variables 
n 


uncorrelated among themselves, which are correlated with an additional variable, or else their 
correlations with this variable must be extremely low. The last result shows us the fallacy 
of supposing that correlations are simply added together for a combined effect ; clearly when 
the variates are uncorrelated among themselves, we add by the sum of the squares. For 
example, if 7) =792=...-=70n='03 one hundred such variables would only raise R to 30, On 
the other hand if the variates are highly correlated together, say e=°81, an indefinitely great 
number of such variables would only raise the multiple correlation to ‘0333, if the individual 
correlation were ‘0300. 


We are now in a position to apply our results to the problem of the relative intensity of 
heredity and environment. This problem has been singularly misunderstood especially by the 
popular exponents of Eugenics. Some illustrations of this may be given here. Major Leonard 
Darwin writes as follows in the Journal of the Eugenics Education Society: “It is impossible 
to compare heredity as a whole with environment as a whole as far as their effects are 
concerned ; for no living being can exist for a moment without either of them*. Moreover, 
in order to compare two things so as to be able to use the words more or less in connection 
with such a comparison, we must have a common unit of measurement applicable to them 
both. But what is the unit by which both heredity and environment may be measured ? 
I myself have no idea. May we not be discussing questions as illogical as enquiring what 
portion of the area of a rectangle is due to its width and what to its length? Js 7t ever wise 
to use words in scientific literature without endeavouring to attach a definite meaning to themt ? 


It is hard to conceive a paragraph of the same length more full of evidence of complete 
ignorance of the methods used in modern science for comparing correlated variates! Yet it 
goes out as the opinion of the President of a Society which is endeavouring to spread the 
scientific doctrines of Eugenics among the people! Major Darwin begins by stating that it is 
needful to have a common unit of measurement in order to compare two variates. To begin 
with we are not comparing two things, but we are comparing the influence of two things on 


* There would in our sense be no heredity if the average child born to noteworthy parents was equal 
to the average child of the whole community. Yet it is perfectly easy to understand how living beings 
could exist under such a law of reproduction. Major Darwin seems to be confusing two things, the fact 
that a man is born true to his species, and the fact that he resembles his immediate ancestry. It is 
the latter fact only which concerns us when we compare heredity and environment, i.e. how variation of 
immediate ancestry affects the individual’s physical or mental characters. But without such heredity 
individuals might quite well exist. 

+ The Eugenics Review, Vol. v. p. 152. The italics are mine. 
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a third, ie. the intensity of a certain environmental influence and the intensity of a certain 
somatic character in the parent, say, on the intensity of the somatic character in the off- 
spring. Yet Major Darwin tells us we cannot do this because we cannot measure these 
things in the same unit !—How suavely yet forcibly Sir Francis Galton himself would have 
ridiculed such ignorance in high places as is passed by the Editor of the Eugenics Journal !— 
We can hear him now telling us how the intensity of each character could be measured by 
its grade, and how the problem turned on whether the same change in grade in the environ- 
ment and in the parental somatic character produced greater or less change in the grade of 
the filial somatic character. When we inquire whether inter-racially stature is more closely 
related to cephalic index or to eye colour, are we to be met by the statement that these 
characters cannot be compared because they cannot be measured in a ‘common unit,’ and 
then be told that it is not “wise to use words in scientific literature without endeavouring to 
attach a definite meaning to them?” Every trained statistician knows that each character 
is measured in the unit of its own variability—in what he terms its standard deviation*, 
and that this standard deviation provides him with a measure of the frequency of each value 
of the variate in question. It seems to me that the only correct sentence in this paragraph, 
is the author’s statement that he himself has no idea what unit is ‘common’ to heredity and 
environment. 


But our author continues : 


“Take any quality, and we find that the human beings composing any community differ 
more or less considerably as regards that quality. Now we can measure the correlation 
between the differences shown in this quality and the differences of environment to which 
the members of the community in question had previously been exposedt. This is one 
correlation. Then we can also measure the correlation coefficient between, say, father and 
son, as regards the quality in question. Here is a second correlation; and if we are told 
that the relative influence of environment and heredity is measured by the ratio between 
these two correlation coefficients, we certainly do thus get a clear conception of what is 


+ 


meant }. 


But has the writer really obtained a clear conception of what such coefficients of correla- 
tion mean, when in the next paragraph he continues : 


“Tmagine an ideal republic, in some respects similar to that designed by Plato, where not 
only were all the children removed from their parents, but where they were all treated exactly 
alike. In these circumstances none of the differences between the adults could have anything 
to do with the differences of environments, and all must be due to some differences in inherent 
factors. In fact the environment correlation coefficient would be nil, whilst the hereditary 
correlation coefficient might be high §.” 


Could any better evidence be adduced that the President of the Eugenics Education Society 
did not know what a coefficient of correlation meant at that date? The coefficient of correlation 
for the environment might be anything from —1 to + 1; the only obvious fact would be that you 
could not find its value, except in the form 0/0, from an environment which precluded any 
measure of variation. How again Sir Francis would have smiled at the notion that the 
coefficient of correlation for a constant environment must be nil. Why should we follow such 


* Of course he may or does need other constants to help in the description of the frequency. 

+ loc. cit. p. 153. 

+ This seems to contradict the writer’s previous assertion that two things are incomparable, if they 
have not a ‘common unit’! 

§ I wrote at once to Major Darwin pointing out the error of such a statement and he withdrew it in 
the next number. But the harm done by an article of this kind cannot be reversed by correcting a 
single misstatement. 
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advice as that given by the President of the Society to avoid as far as possible “such phrases as 
the relative influence of heredity and environment,” when on his own showing he does not in 
the least appreciate the methods by which this relative influence is measured ? 


Then Major Darwin continues : “Surely what we want to know is how we can do most good— 
whether by attending to reforms intended to affect human surroundings, or to reforms intended 
to influence mankind through the agency of heredity. But does this ratio [that of the environ- 
mental and hereditary correlation] give us any sure indication of the relative amount of attention 
which should be paid to these two methods of procedure?” Our only reply can be that these 
correlations certainly do, and that as long as the President of the Eugenics Education Society 
fails to grasp their meaning, he is doing grave harm to the science of eugenics. 

We measure the change in the character of an individual which would be produced by a 
change of a like or an allied character in a parent, such change being one of which we have 
experience ; we measure the change which would be produced in the character of the individual 
by changes in the environment such as we have experience of, i.e. when we move the individual 
from a badly ventilated to a well ventilated house, from a back to back to a through house, from 
a low wage to a high wage, and so forth, and we find the resulting changes are of a wholly 
different order in these cases to what happens when we change the physical characters, the 
health or habits which define the parents. It is on the basis of this that we assert that the relative 
strength of heredity is far greater than the strength of environment. To this reasoning, apart from 
such arguments as the above or those to be immediately dealt with, reply is only made by talk as to 
the impossibility of an individual surviving if you deprived him of his normal environment! It 
would be just as reasonable to assert that everything must be due to heredity, because a race of 
supermen would breed supermen ! What the scientific eugenist has endeavoured to measure are 
the influences of such range of differences in environment as occur in everyday experience and 
are therefore producible from the political, economic and social standpoints, not the absence of 
all environment at all. But while this is recognised by some of the popular eugenic writers, they 
have approached the problem from another standpoint which indicates equally how little they 
grasp modern statistical theory. We admit, they say, that the environmental correlations may 
be of the order ‘03 or :05 and the inheritance correlations of the order 50. But this is the 
correlation of one character in environment. You ought to take ten or twenty, and then you 
will have multiplied up environment to be more effective than heredity, for 03 x 20=-60. In the 
first place we may suggest that it would be just as reasonable, if the argument were a valid one 
to multiply up the favourable hereditary characters, to take weight, height, muscular activity, 
health, intelligence, caution, and many other desirable factors, and these not only in one parent 
but in brothers, sisters, aunts, uncles and grandparents and treat the cross-correlation of these 
with the character under discussion. But although every improvement in stock would reflect 
itself in improvement in offspring, correlations cannot be added together—any more than forces 
by simple arithmetical addition. You do not combine two hereditary correlations any more than 
two environmental correlations by mere addition. You must proceed by the combinatory process 
indicated at the commencement of this paper, which is one of course familiar to every trained 
statistician. : 

Yet here is a statement which the Editor of the Hugenics Review admits to its pages without 
contradiction * ; 

The point that we wish to make is this, In the face of so much ignorance concerning, not only 
heredity itself, but also its complement, the influence of environment, how can any one be justified in 
making sweeping generalisations with reference to these subjects? 

Such generalisations, however, are made. It is said that we have a definite proof that inheritance is 
of far greater strength than environment. This argument takes the following shape. The correlations 
between parent and offspring for a number of features have been calculated, and the mean is found to 


* Vol. v. p. 219, in an article by A. M. Carr-Saunders. 
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be somewhere about °5. Correlations between individuals and various aspects of their environment have 
also been worked out—as, for instance, mental ability and conditions of clothing, or between myopia and 
the age of learning to read*—and the mean value is found to be about -03. It is then said that the 
mean ‘‘nature value” is at least five to ten times as great as the mean ‘‘nurture value,” and upon this 
is founded the generalisation that ‘‘nature” is of far greater importance than ‘‘nurture’+. It may be 
questioned, however, whether such a comparison does not involve a serious misiake. For if we consider 
the two mean values that are compared, we find that, whereas the ‘‘mean nature value” is the mean 
value of a number of observations, all of which provide a full measure of the strength of heredity, the 
‘mean nurture value” is the mean value of a number of observations, each of which measures only the 
strength of some one isolated aspect of environment. It would appear then that the full strength of 
inheritance has been compared, not with the full strength of environment, but with the average of a 
number of small isolated aspects of the latter. Asa matter of fact it is quite beyond our power at 
present to sum up the full effect of environment upon the individual and compare it with the full effect 
of heredity. We are, therefore, justified in saying that we neither know in particular cases how far the 
environment can produce any effect, nor can we make any definite statement as to the comparative 
strength of ‘‘nature” and “nurture.” 

Now this is the doctrine passed by the Editors of the Hugenics Review, the journal of a 
society, which has assumed the mantle of Francis Galton{, and it is passed, because the 
editorial committee of that society does not grasp the meaning of multiple correlation! The 
passages in italics have been so printed to draw our readers’ attention to them. In the first 
place, of course, a single correlation coefficient does not provide a full measure of the strength 
of heredity. In the table cited the coefficients are those for one parent or for one brother or 
sister. Each relative—and those for independent stocks are either non-correlated or inter- 
correlated very slightly—provides such a coefficient, and further each character in such relatives 
may be correlated with the character under discussion in the subject in question. In the; next 
place the environment factors do not consist of “some one isolated aspect of environment.” 
All these factors or aspects are closely interlinked, and this was a fact well-known to the 
workers in the Galton Laboratory. The real interpretation of such a difference as 560 and °03 
in the average values of single coefficients can only be appreciated by those who are conversant 
with the theory of multiple correlation, and it is quite clear that those who profess to guide the 
public in this very difficult problem—which is essentially a scientific problem—lack any adequate 
knowledge of the sole instrument by which any conclusion can be drawn. : 

The writer appears to be wholly ignorant of the nature of multiple correlation in the first 
place, and in the second entirely to overlook the very high correlations which exist between 
environmental factors. Bad wages, bad habits, bad housing, uncleanliness, insanitary sur- 
roundings, crowded rooms, danger of infection, etc., etc. are all closely associated together, 
and while the order of correlation between environmental and physical characters is low, that 
between individual environmental factors is in our experience very high. Thus the problem of 
multiple correlation illustrates closely the theory developed in the first part of this note; we 
have to deal with a low p and a high «. 


For example, if we take the environmental factors to have an average inter-correlation of °70, 
then an infinity of such factors for a mean environmental and individual correlation of ‘03 would 


* As the writer phrases this correlation, it is very liable to be misinterpreted. What the Galton 
Laboratory did was to show that myopia was very markedly inherited, and that the theory that it was 
largely due to school environment was incorrect, because children who began to read late, i.e. went late 
to school, were not less myopic than those who went early. 

+ Karl Pearson, Nature and Nurture, Eugenics Laboratory, Lectures vi. p. 25, 

+ If there was one point on which Francis Galton felt strongly and wrote it was on this point of the 
relatively great intensity of ‘‘nature” as compared with “nurture.” I do not stand alone in recognising 
it as an essential part of his teaching: ‘I am inclined to agree with Francis Galton,” writes Charles 
Darwin, ‘‘in believing that education and environment produce only a small effect on the mind of 
anyone, and that most of our qualities are innate.” 
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only raise the correlation to 0359 against a s¢ngle parental correlation of *5000; if the correlation 
was ‘05 instead of ‘03, we should have the total possible environmental multiple correlation ‘0598 
as against ‘5000. Even if we raise the average environmental correlation to ‘1 and the inter- 
environmental factor correlation be reduced to °5, the multiple correlation of an infinity of factors 
is only :1414 as against the sengle factor of heredity °5000. Even if we could pick out one 
hundred environmental factors which had no inter-correlations—which experience shows is 
wholly impossible—and each of these independent factors was correlated to the extent of -05 
with the mental or physical characters of an individual they would only just reach the hereditary 
influence of a séngle character in a séngle parent. 

Now let us suppose an absolutely idle case, namely that the environmental factors had the 
same correlation as a parent, i.e. 5, with the character of the individual, and only a correlation 
of °6 with each other, then if we could use an indefinitely great number of such factors the 
multiple correlation would only be ‘5//°6=°6455, while the correlation with two parents, with 
no assortative mating, would be -7071. Even with assortative mating, it suffices to take only 
the four grandparents into account to show that heredity acts in excess of an environmental 
scheme even so preposterous as is suggested above. If we take the parental correlations *50, the 
grandparental ‘25, and those of assortative mating -15, we have for the determinant : 


A=| 1, °50, 50, ‘25, -25, -25, -25 
(60,1; 15). "50; “50, “0; 0 
| 50, ‘15, 1, -0, . 0, ‘50, ‘50 
5, 50, 0, « 2 


i 
25, 0, ‘50, 0, 0, ‘15, 1 


Add together the second and third rows multiplied by 3951, and the fourth, fifth, sixth and 
seventh multiplied by ‘0456 and subtract the result from the first. The first row then becomes 


| 5593, 0, 0, 0, 0, 0, O| 


the others of course remaining the same. 


Hence NOD 9B Anne 
and R?2=1— A/Apy =1—°5590 = 4407. 
Therefore R:=='6639. 


Or together grandparents and parents would influence a man’s character more than an 
5S 
infinity of environmental factors of the same grade of correlation, because the latter factors 
are far more highly correlated together than several of our relatives. 


Actually of course we are dealing with average values; the average value of environmental 
correlation with individual character being in our experience of the order ‘03 to ‘05 and the 
Lod 


inter-environmental factor correlations of the order 5 to ‘7. But these averages enable us to 
appreciate the total effect. 


The doctrine taught by the writers in the Hugenics Review, that we know nothing of the 
relative intensity of environment and heredity and that it is unwise “to use words in scientific 
literature without endeavouring to attach a definite meaning to them” only demonstrate how far 
the Editors of that Journal are removed from any appreciation themselves of modern statistical 
methods. How far the doctrine is removed from the very strong views held on this point by 
Francis Galton, only those who have studied his writings and know how strongly he felt person- 
ally on the subject are in the least competent to appreciate, 
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VI. Formulae for the Determination of the Capacity of the Negro 
Skull from External Measurements. 


By L. ISSERLIS, B.A. 


§ 1. Formulae for the determination of the capacity of the human skull from external 
measurements, were obtained by Lee and Pearson*. The material they employed consisted 
of various series of measurements of Bavarian, Aino and Naqada skulls. Measurements of 
Ancient and modern Egyptian and other non-European skulls were employed, chiefly for 
purposes of comparison. The formulae, some of which will be quoted later, were intended 
primarily for the prediction of the capacity of European skulls, from external measurements. 
Doubt has been thrown on several occasions on the applicability of these formulae to the Negro 
skull, one of the reasons alleged being the supposed difference in thickness of the bone of 
European and Negro crania. 

The publication t of the late Dr R. Crewdson Benington’s researches on the negro skull has 
made it possible to obtain similar formulae for negro skulls, and to test how far these can 
be applied to the prediction of the capacity of European skulls and conversely to test the 
applicability of Lee and Pearson’s Equations to the negro skull. 

§ 2. The material is fully described in Dr Benington’s Study. The crania dealt with in 
the present paper are Benington’s series A, B, C. 

A. Congo Crania in the Royal College of Surgeons. These crania provide 46 males and 
and 21 females, as owing to various defects no capacity is available for numbers 25, 38, 48, 54 
among the males and numbers 69, 72, 75, 79, 82, 85 among the females. 

B. Crania from the Gaboon, Group I, brought by Du Chaillu from Fernand Vaz in 1864. 
Of the 50 male and 44 female crania in the series, 2 males (numbers 3 and ?) and 1 female 
(number 2) are defective, leaving 48 male and 43 female crania available. 

C. Crania from the Gaboon, Group LI, brought by Du Chaillu from Fernand Vaz in 1880. 
Two of the 18 males (numbers 12a and 20) and two of the 19 females (numbers 8 and 18) 
are defective. 

Altogether 110 male and 81 female crania have been dealt with. The correlation has been 
calculated of the capacity (C) and the product of the breadth, length and total height (B, Z 
and #), for each group and for the aggregates of 110 male, and of 81 female crania. 

Correlation coefficients have also been calculated for the capacity and breadth, capacity and 
length, and capacity and total height, but for the aggregates of the three groups only. Re- 
gression formulae are given in all cases. It is to be observed that Dr Crewdson Benington’s 
measurements of capacity were taken with mustard seed, packing and measuring glass and 
that the error of measurement or rather his average difference as compared with other workers 
in the Biometric Laboratory was under 10 cm*. 

In comparing the regression formulae obtained here, with those given by Lee and Pearson for 
European and other skulls it must be remembered that in all their formulae except (12) and (18) 
of p. 247 they employed the auricular height and not the total height. In the present paper as 
in Dr Benington’s study H denotes the total height. Lee and Pearson denote this by #’ and 
use # for the auricular height, 

It was not possible here to use the auricular height as it was not available for the whole of 
the Gaboon series B and C. 

* Phil. Trans. Vol. 196, Series A, pp. 225—264. 
+ Biometrika, Vol. vit. Nos. 3 and 4, Dec. 1911. 
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Taking first the male skulls, the mean value of the capacity and the product BLH, their 
standard deviations and the correlations are given in the following table. 


TABLE I. 
Mean capacit | Mean value of : ; in 
in as z BLH incom} | %e 2 em. nae "C, BLU 
46 Congo Skulls... 1344 3303 126°22 282°99 872 
48 Gaboon (1864) ... 1379 3295 108°30 | 230°30 *822 
16 Gaboon (1880) ... 1447 3463 10960 266-42 308 
110 Negro skulls... 1375 3323 120°74 265 °20 "842 
The corresponding regression lines are 
for the 46 Congo ae3 C=‘00038889BLH+ 59 +o SCRE EECCaTE (1), 
n 
48 Gaboon (1864) | C=-0003865BLH+ 105% 4 eee (2), 
n 
16 Gaboon (1880) | 0=-0003323BLH4297 + occ (3), 
vn 
+A 
110 male negro skulls C='0003849BLH+ 96 ao web eneee caseneade (4). 
n 
Lee and Pearson’s corresponding equation for males is 
C= 000266 L BH’ + 594-6* vo. ccscececsscesceccescescesscecenes (P). 


This is not a regression line, but is obtained by method of least squares from the results for 
various races in their table 20. 


The formulae 1—4 can be used to predict the capacity of an individual skull from external 


measurements. The probable errors of the mean were calculated by the formula 0°674490, — 
N 
where # is the number of skulls in the group to which the formula is applied. If we substitute 
in (1)—(4) the mean values of B, LZ, H for the Bavarian male skulls used by Lee and Pearson, 
Viz. : 
B =150°5, 
L =180°, 
H=133'8, 
5 41 
we obtain, from (1), C=14744+—= 
Vn 
np Op C=1471 #82 
Vn 
» (3), c=1506+ © 
Vn 
» 4) v=14964 >, 
Vn 


* Loe, cit, Equation (12). H’=total height. 
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The measured capacities of these German skulls have a mean value of 1503 c.c. a result 
which is in very close agreement with (4) the formula based on 110 skulls. 1508 is the mean 
capacity of 100 skulls so that === 655. Thus the difference between the actual mean capacity 

N10 
of German skulls and the mean capacity estimated by the negro formula is less than 10 cm’. 
although the mean capacity of German male skulls exceeds that of negro males by 


1503 —1375=128 cm’. 
If the above values of B, Z, H are substituted in Lee and Pearson’s formula (P) on p. 4 
we obtain C=1492. 


On the other hand if we substitute the mean values of the dimensions of the 110 male negro 
skulls, B=137, 2=178, H=135 in formula P we obtain C=1400 as compared with the measured 
mean of 1375. 


This is not as good a reconstruction as our formula (4) or as the formulae of Lee and Pearson 
employing auricular height, and is probably due to the fact that P is obtained by the method of 
least squares from 11 means only. 


§ 4. An approximation to the influence of the thickness of the bone of the skull on pre- 
dictions of capacity from external measurements can be obtained by differentiating the equation 


C=hkBLH+const. 
and putting dB=dL=dH=t. 
We obtain dC=k(BL+LH+ HB)t, 


or if we observe that in the equations the constant is comparatively small 


dC 1 1 1 
C= G ry 38 7H) t 
with B=150°5, L=180°6, H=133'8 


de 
1500 — 


t (02) approximately. 
Thus a difference of 10 cm*. in capacity corresponds to a difference of 4mm. in thickness 
which is about 5°/, of the thickness (say 6 mm.) of the human skull. 


We may fairly conclude then, that there is no appreciable difference in the thickness of the 
negro skull as compared with the European. 


§ 4. The female crania yield very similar results. The following is the table for the female 
skulls. 


TABLE II. 
| Mean capacit Mean BLH : : Giana 
in tnt s in cm.3 oD OS om. "cy BLH 
21 Congo Skulls... 1206 2858 LOZ e Se 90 
43 Gaboon (1864) ... 1232 2924 126°7 270°95 8814 
17 Gaboon (1880) .., 1240 2964 97°31 265°8 8560 
81 Negro skulls... 1227 2956 117 255°72 "7668 | 
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The corresponding regression lines are 


21 Congo skulls C= 0003645 BLH+164+ E> Maseeameneace these (5), 
Jn 

43 Gaboon (1864) | C=-0004122BLH+ 27+ aw Teen, (6), 
Vin 

17 Gaboon (1880) C= 0003134 BLH4+311+ = sea ea Sessatiente teenies (7), 
vn 

81 Negro skulls C= 0003508 BLH 42044 ieee (8). 
vn 


The corresponding Lee and Pearson formula obtained by the method of least: squares is 
C2 OO0USC LBA 7812 cciccceccsscscovccessezsssesesccescenss (Q). 
The mean values of B, L, H’ for the Bavarian female skulls discussed by Lee and Pearson are 
B=144-11, 
L =173'59, 
Hf’ =128°07. 
With these values, we deduce from 5—8 the following values for C. 
45 
(5) C=1331+ 


ww 
(6) C=1347+ = 
vn 


(7) C=13154— 


The mean of the measured values of the capacities of these skulls is 1337 and formula (8) 
based on 81 negro skulls gives a result in very close agreement. 

If the above values of B, L, H’ are substituted in Lee and Pearson’s formula @ we obtain 
C'=1284 a result which differs from the true value much more seriously than the prediction by 
the negro regression formula. 

Again, if we insert the mean values 

B =130°75, 

£ =171°38, 

H=129°'81, 
of the 81 female negro crania in the formula Q we get C=1266 as against the mean of the 
measured values which is C=1227, demonstrating again the fact that the formulae P, @ based 
on 11 means are not as good as the regression formulae. 


§ 5. We add tables of the correlation between capacity and breadth, capacity and length, 
and capacity and total height for the 110 male and the 81 female skulls, and for comparison 
reprint the corresponding value for German (Bavarian) skulls. 
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TABLE III. 
Correlation Males. 
Negro German 

Capacity and Breadth 4977 6720 
Capacity and Height 6080 (total height) 2431 (auricular height) 
Capacity and Length 7433 5152 

TABLE IV. 

Females. 
Negro German 

Capacity and Breadth 7578 | “7068 
Capacity and Height 5450 (total height) | 4512 (auricular height) 
Capacity and Length 6699 | ‘6873 


The corresponding regression lines are given in the tables below : 


TABLE V. 
Males. 
Negro German 
(9) C=12°6356B-—3561+202 | C=13-432B—517°34 
Vn 
(10) C@=12-8301Z— 1087 + = C=9'892L — 289-55 
7 
(11) C@=15°3265H’— 694 + > C=5:264H + 86805 (auricular height) 
nN 
(H'=total height) 
TABLE VI. 
Females. 
Negro German 
(12) @=17°872B-1114 +7 C=15-716B — 927-66 
nN 
87 
(13) C=12°46 L- Oe ae C=12°055L —755°53 
nN 
98 
(14) C=10°871H’— 184+ J, | C=10993H+ 8213 (auricular height) 


(H’ =total height) 
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No great degree of accuracy can be expected in reconstructing the capacity of a skull from a 
single measurement, but the remarkable difference of formula (11) for negro skulls from the 
corresponding German formula is of course due to their referring to different measurements 
of the height. If we insert H=133°8 in (11), which is the mean total height of the Bavarian 


96. 
skulls we get C=1356'7 + —= instead of the measured mean C=1503 cm’. 


Ne 
sos : ; 105. : F : 
Similarly equation (9) gives C=1555°6 + a instead of 1503 when we insert the German 
n 
mean B=150°5. 


Thus 9—14 are of little use for our purpose. 


VII. Note on a Negro Piebald. (C. D. MAYNARD.) 


THE remarkably interesting photograph of a negro piebald on Plate X has been forwarded 
to the Editor by Dr C. D. Maynard. The native comes from the district round Chai Chai. 
Dr Maynard writes from Ressano Garcia, and states that the hospital attendant took the 
photograph. The extraordinary interest of the case arises from the fact that the thighs and 
feet are of normal negro pigmentation, but in the other patches we have varying degrees of 
pigmentation of the skin down to albinotic white. Unfortunately there is no dorsal view, but 
the back is stated to be also affected with albinotic areas. The boy reported that he was 
in the same condition when born, and that the nature and areas of the pigmentation had not 
altered. 


VIII. Note on Infantile Mortality and Employment of Women, from 
the Report on Condition of Woman and Child Wage-earners in the United 
States, Volume XIII. Infant Mortality and its Relation to the Employment 
of Mothers. 

By ETHEL M. ELDERTON. 


THe author of this Report emphasizes the difficulty of determining the effect of women’s 
employment and points out that 


‘It would be possible to draw positive conclusions as to the relative importance of this particular 
factor only by point-to-point comparison of the infant mortality for a period of years in two large 
communities, or two classes of large communities, in which all the material conditions were sub- 
stantially common, with the single important exception that in one a considerable proportion of the 
married female population of child-bearing age were at work outside of their homes and in the other 
community with which the comparison was made none of the women were so employed. 


To admit of entirely sound conclusions, it would be necessary that the populations-—and especially 
the women—of both communities should be of like ages, races, and physical health, that their living 
conditions should be practically identical, and that, in a general way, the child-bearing women should 
be of about the same grade of intelligence....... In default of some such comparison on a broad scale of 
the mortality of the infants of working and non-working women of similar ages, races, intelligence, and 
living conditions, no one can determine accurately how many of the deaths of working women’s infants 
are due to the mother’s work and how many to the other conditions of their lives and environment.”’ 
(p. 18). 


The author illustrates the point by taking the six New England States and giving the infant 
deathrate, percentage of women of 16 years and over who are breadwinners, percentage of foreign- 
born to the population and percentage of population living in towns of 4000 and more inhabitants, 
and showing that, though the states with the highest infant mortality have also the largest 
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number of women employed, they have also the largest percentage of foreign-born and of those 
living in urban surroundings, and that it is therefore impossible without further investigation to 
assign the infant deathrate to any of these three factors. 


A further investigation has been undertaken into the 32 Massachusetts cities and the death- 
rate under a year is given, the percentage of foreign-born, the births per 1000 of the population*, 
the percentage of women gainfully employed and the percentage illiterate, and a comparison is made 
between the ten cities with the highest and the ten cities with the lowest infant deathrate and 
percentage of women employed and the other factors enumerated. The conclusion is reached that 
“These comparisons indicate, superficially at least, that a more direct relation exists between 
infant mortality and the birthrate, the percentage of foreign-born, and the percentage of female 
illiteracy than between infant mortality and the employment of women.” (p. 38). 


There can be no doubt that a direct study of the infant mortality in relation to women’s 
employment can only properly be made, when we confine our attention to women, employed and 
unemployed, who are actually mothers and live in the same town, and when we correct for aget, 
and if possible home conditions. Still if we take a series of different towns the right method 
must be to correct by the method of partial correlation for such divergent factors as we are 
able to ascertain and allow for in the series of towns investigated. I have endeavoured to apply 
modern statistical methods to the data of this Report, taking as measures of the environmental 
conditions in the towns: D the general deathrate, 7= percentage of illiteracy, f= percentage of 
foreign-born population, e=percentage of females employed 10 years of age and upwards (note, 
not percentuge of employed mothers, so we may be largely measuring effect of child labour on 
future motherhood), and @=deaths under one year per 1000 births. Then we have for cor- 
relations : 

Tc = 68, Ta= “70, ldf= 74. 
Hence numbers of foreign-born and of illiterate appear to be slightly more influential on infantile 
mortality than employment of women. These values are certainly high and the first is the sort 
of crude value which is used as an argument against the employment of women. Proceeding to 
partial correlations we have 

Pac= 36, a= “43, oS 48, 

Ct ed "42, eas 57, Fide “Bl. 
We next corrected for two factors and found : 

iflde= “34, ef a= 12, i af= 43, 


Thus we see that illiteracy has least influence on the infantile deathrate and the presence of 
foreign-born most. 


But even the presence of foreign-born and of illiterates is not a very complete measure of 
environmental effects liable to influence the infantile mortality in different towns as apart from 
employment of women. Many women employed means industrial conditions and possibly 
generally bad environment. I have taken as a measure of this the general deathrate D and find 

TDa= 71, 'De= ‘47, TDi= ‘60, Di “49, 
Whence I find : 
prac='97, pra="62, pras="75, 
D'fe= “49, D'i= 61, Dit 68, 


showing very substantial relations after correction for a general measure of poor environment. 


* The author is not very confident of the full accuracy of the complete registration of births. 
+ Young women are often employed up to the birth of their first one or two children, but the death- 
rate of these elder-born is heavier than the deathrate of those who immediately follow. 
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Next proceeding to allow for two factors we find 
pDra="23,  pora="44, ppl ae = "35, 


the latter result shows that general deathrate and illiteracy are about equally influential on the 
relation of employment of women to infantile mortality. Finally I corrected for all three factors 
and found : 

isDl de = "28 


or 60 °/, of the crude correlation 72,='68 is due to women being most employed in towns where 
the general deathrate is high, where illiterates are frequent and the population is largely foreign- 
born. How much further the relationship would be reduced, could we equalise other features of 
these Massachusetts cities, it is not possible to predict. The examination of the individuals im 
one city appears to me to be the only satisfactory method of disentangling the numerous factors 
which influence infant mortality. We commend, however, the study of the first part of this 
Report, as it deals very clearly with the difficulties which arise, and will counteract the tendency, 
which is prevalent, to assert causation whenever association is observed. The author lays stress 
on avoiding such logical confusions. 


Part Il of the Report deals with infant mortality and its relation to the employment of 
mothers in Fall River, Massachusetts. In 1908 the attempt was made to visit the homes of 
each of the mothers of the 859 infants who died during the year and to ascertain details con- 
cerning her occupation, ete. In 279 cases the family could not be traced. In 266 cases prior to 
the birth of the child the mother was at work outside the home while in 314 cases the mother’s 
work was limited to household duties or other work carried on entirely at home. Thus only the 
cases of deaths are dealt with and the causes of death are compared in the two groups of cases 
(1) when the mother was at work outside the home prior to the birth of the child and (2) when 
the mother’s work was carried on entirely in the home. 


I hold that this method will never prove as satisfactory as that employed in districts in 
England ; in England certain districts are chosen and every baby within that area is visited 
and the deathrate per number born in one group can be compared with another and the 
circumstances surrounding those babies who survive and those who die in the first year of life 
in a given district can be analysed. 


I do not think that the fact that a rather higher percentage of all deaths from gastritis etc. in 
Fall River occur when the mother works away from home and a rather higher percentage from 
congenital debility at birth when the mother does not work away from home will help us much 
in discovering the influence of the employment of the mother on infant mortality, nor do I think 
it will throw much light on the question of stillbirths with which the Report also deals. It is 
found that there are no more stillbirths proportional to all deaths when the mother is industrially 
employed, but it seems to me that this tells us nothing about the number of stillbirths pro- 
portional to all births. The real question is whether mothers employed away from home in 
factory or workshop, wuose other circumstances are the same, lose more children in the first year 
of life or have more children stillborn than the mothers who are only employed in their homes 
and I do not think a comparison of causes of death will lead us much further, and I think it may 
lead to difficulties. 


When dealing with the mother’s work after childbirth in relation to the causes of infant 
mortality it is pointed out that the smaller percentage of deaths from congenital disease among 
the children of mothers who returned to work after childbirth was owing to the fact that most 
of the children dying from this group of causes died in the early weeks of life before the mother 
returned to work. For this same reason the number of deaths from gastritis ete. of children 
whose mothers returned to work is exaggerated, for we are missing out a whole series of illnesses 
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which have ceased to add to the child deathrate by the time the mother returns to work and we 
must increase in this way the percentage of deaths of any disease of the later months of a child’s 
first year of life. 


It seems to me that a comparison of deaths in this way will really give very little information ; 
an excess of deaths from one disease means a defect in some other disease; it is shown that 
when the baby is nursed exclusively by the mother 26-0 per cent. of the deaths were from 
diarrhoea, gastritis, etc.; when partly nursed the percentage was 52°3 and when artificial food 
was exclusively employed the percentage of deaths from diarrhoea etc. was 42°9; the baby 
certainly dies less from gastritis when it is breast fed but it dies in greater numbers from other 
causes. Here again there is a difficulty; deaths from congenital diseases fall on the first weeks 
of life when breast feeding is the rule, while deaths from gastritis etc. fall on the later months of 
child life when “partial breast feeding” has become more common and I do not think it is 
possible to draw any conclusions from a comparison of deaths from one disease to deaths from 
all diseases as to the importance of artificial feeding in relation to deaths from gastritis. 


Interesting information is given as to the reasons for artificial feeding ; the numbers are not 
large enough to justify any definite conclusions, but thisis such an important part of any inquiry 
into the influence of artifical feeding on the infant deathrate that one welcomes its inclusion in a 
report of this kind. 


WE have been requested by Professor F, M. Urban to insert the accompanying announcement. 


ANNOUNCEMENT. 


A prize of One Hundred Dollars ($100.00) is offered for the best paper on the Availability of 
Pearson’s Formulae for Psychophysics. 


The rules for the solution of this problem have been formulated in general terms by William 
Brown. It is now required (1) to make their formulation specific, and (2) to show how they 
work out in actual practice. This means that the writer must show the steps to be taken, 
in the treatment of a complete set of data (Vollreihe), for the attainment in every case of a 
definite result. The calculations should be arranged with a view to practical application, i.e. so 
that the amount of computation is reduced to a minimum. If the labour of computation can be 
reduced by new tables, this fact should be pointed out. 


The paper must contain samples of numerical calculation, but it is not necessary that the 
writer have experimental data of his own. In default of new data, those of F. M. Urban’s 
experiments on lifted weights (all seven observers) or those of H. Keller’s acoumetrical experi- 
ments (all results of one observer in both time-orders) are to be used. 


Papers in competition for this Prize will be received, not later than December 31st, 1914, by 
Professor E. B. Titchener, Cornell Heights, Ithaca, N.Y., U.S.A. Such papers are to be marked 
only with a motto, and are to be accompanied by a sealed envelope, marked with the same motto, 
and containing the name and address of the writer. The Prize will be awarded by a committee 
consisting of Professors William Brown, E. B. Titchener and F. M. Urban. 


The committee will make known the name of the successful competitor on July 1, 1915. 
The unsuccessful papers, with the corresponding envelopes, will be destroyed (unless called for 
by their authors) six months after the publication of the award. 


Corrigendum. Dr Derry has most kindly pointed out a slip on p. 307, Vol. VIII; the value 
of 100 (B—#H)/Z for Congo female crania is +1:9 and not —1°9, which brings these crania nearer 
to their proper place, and the remarks on this point p. 308 should accordingly be cancelled. 
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A PIEBALD FAMILY. 
By E. A. COCKAYNE, M.D., MRCP. 


In spite of the great interest, which they have always excited, well 
authenticated examples of piebalds in the dark races have been found to be rare. 
In the white races they are much less conspicuous, in part owing to the presence 
of clothing and in part owing to the lack of contrast between the pigmented 
and unpigmented skin, but the likelihood of their coming under the notice of 
a skilled observer is much greater. The scarcity of records shows that piebalds 
in the white races also must be very uncommon. Last year I met with a case 
in a baby, and found that the child belonged to a family, many of whose members 
showed a similar defect of pigmentation. The family, belonging to a farming 
stock, originally came from the neighbourhood of Bury St Edmunds in Suffolk, and 
the anomaly is known to have descended directly through six generations. The 
oldest member, with whom I have talked, is fairly certain that it was present in 
one generation at least before this. 

Of the first two generations in the pedigree (see Plate XI), I could obtain no 
definite information except the statement as to the existence of the piebaldism in 
I. 1 and II. 1, but of the third, III. 2 is said to have had a frontal blaze of white 
hair and white skin on the neck and forearms, which was very conspicuous owing 
to its marked contrast with the neighbouring weather-stained normal skin. 

IIL. 4 appears to have been the only member of the family who showed a 
marked dislike to the condition and always wore a wig to hide the frontal blaze. 
III, 2, whose family name was C—-—%*, had fifteen children. The first, IV. 2, 
a male, with dark hair, married twice, and had eight normal children, five by the 
first wife and three by the second. The second child, IV. 5, was a piebald, with a 
large frontal blaze, white skin on the front of the neck and arms, and blue eyes. 
He transmitted the condition to all his three children. V. 3, the eldest boy, 
aged 22 and unmarried, possesses dark hair, with a V-shaped frontal blaze of 
white or cream coloured hair, the apex of the V commencing near the coronal 
suture and spreading out to a width of 34 inches, as it reaches the forehead. The 
eyebrows and some of the eyelashes are white. The next boy, V. 4, is aged 18. 
He has light hair and a very large blaze of unpigmented hair, which covers the 
whole of the top of the head. His eyebrows and eyelashes are white, and the eyes 
are blue. Both boys have white patches on the front of the neck and on the arms 


(see Plate XII). 


* Names preserved in the confidential register of the Galton Laboratory, 
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Next in the fourth generation were twins, IV. 6 and 7, both piebalds. They 
were evidently not uniovular, because one had dark hair, and one light, and the 
white blazes were dissimilar in extent, but it is uncertain which had the larger. 
Both died at an early age. 

The next, IV. 8, a girl, was normal with dark hair and eyes and remained 
unmarried. Next came a woman, IV. 10, who was a piebald with a large frontal 
blaze, white eyebrows and eyelashes, and white skin on the front of the neck and 
forearms. The right eye was blue, and the left brown (see Plate XIII (B)). Her 
child, aged 13, is quite normal with light hair and dark eyes. 


The next child, IV. 12, Mrs W——, has a large frontal blaze and dark brown 
irides. There is a large irregular patch of white skin extending from just below 
the chin to the heads of the clavicles, and round it the skin appears to be more 
deeply pigmented than the rest of the skin of the neck. There are a few small 
islands of pigmented skin near the edge of the unpigmented area. The skin of 
the anterior aspect of the forearms is unpigmented from the elbows to the wrists, 
and here also, there are some small islands of pigmented skin in marked contrast 
to the unpigmented area, in which they lie (see Plate XIV). 

The first two children of this individual were daughters, V. 8 and V. 9, both 
piebald, the third a normal son, V. 10, and then three more piebald daughters, 
V. 11—138. The first of the daughters, Mrs G——, V. 8 (see Plate XIV), is very 
fair with a very large frontal blaze covering the whole of the top of the head, 
and her eyebrows and eyelashes are white. Her normal hair has pale creamy 
diffused pigment and, according to the individual hair, some to a decided number 
of granules*. The hair of the blaze has no diffused pigment and no granules. 
The irides are light brown, but the outer segments on both sides are paler and 
greenish in colour. The skin of the forehead and base of the nose is very pale in 
colour. She has a large white patch on the skin of the front of the neck, beginning 
just below the chin and widening out so as to embrace that over the inner ends 
of both clavicles. As in her mother there appears to be some concentration of 
pigment round this white area, and there are small isolated areas of pigmented skin 
near its edge. She has unpigmented skin on the anterior aspect of both forearms. 


Of her two children the first, VI. 1, a boy aged 8, is normal, the second, VI. 2, 
a boy aged 14, is a piebald (see Plate XIV). This child, VI. 2, was nine months 
old when first seen. He had a very large frontal blaze, resembling that of his 
mother and covering all the top of the head, the eyebrows and eyelashes were 
white with the exception of some of the outer hairs. Hair, pale cream in colour, 
said to be from the light area, has pale creamy diffused pigment and some granules 
(8), the granules being very small. It was obvious, even at this age, that hetero- 
chromia iridis was present. The right iris was pale except for a sector of dark 
grey occupying the upper and outer quadrant, the left iris was entirely dark grey. 
No difference in colour of the skin of the neck or forearm could be made out. 


* B to y on the Galton Laboratory scale of granular pigmentation. 
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When the baby was seen after the summer of 1913, the grey portions of the irides 
were becoming brown, the pale portion was still light blue. The face and arms were 
sunburnt, and it was noticed that the forehead was paler than the rest of the face. 
There was a pale area on the front of the neck, and the whole anterior surfaces of 
the forearms were white, the edges being very irregular in contour. There was 
also a white streak running obliquely right across the posterior or extensor aspect 
of the left forearm, and this offered a marked contrast with the rest of the surface, 
which was very brown. When the sunburn had died away the difference between 
the pigmented and unpigmented skin could no longer be made out. 

IV. 12’s second daughter, V. 9, aged 23, has only a small cream coloured 
frontal blaze, and the rest of her hair is light brown (see Plates XV and XVI). 
The eyebrows are composed of an even mixture of brown and white hairs, and 
the eyelashes are similar, with brown and white hairs alternating. The irides are 
grey and uniformly pigmented. There is a large irregular area of white skin at 
the base of the neck. 

The whole of the anterior aspect of the right forearm is unpigmented, and 
there are similar small areas scattered over the posterior aspect (see Plate X VII). 
The left forearm is white only on the anterior aspect. 

The next girl, V. 11, is aged 9. She has a very small frontal blaze, but the 
skin of the forehead is pale (see Plate XV). The eyebrows show a division into 
two parts, on the inner halves grow white hairs only, and on the outer brown hairs. 
The eyelashes on the contrary consist of alternate brown and white hairs. The 
irides are grey and uniformly coloured (see Plate XVIII). There is only a small 
white area in the middle of the front of the neck, but there are well differentiated 
white areas on the anterior aspects of both forearms (see Plate XIII (A)), and on 
the inner aspects of both upper arms. Her hair was examined and the first sample 
showed very pale diffused pigment and some granules (8). Two more samples 
were then examined, one from the blaze and one from the neighbouring part of 
the scalp. The first showed no diffused pigment and no granules, the second 
showed the majority of hairs with yellow-brown diffused pigment and a decided 
number to plenty of small granules (y—6), but a few had no diffused pigment 
and no granules. 

The next piebald child, V. 12, died young. She had a frontal blaze and blue 
irides. Some of her hair showed very pale diffused pigment, and some granules (8). 

The next child, V. 13, also died young. She was a piebald nearer to the 
classical type than any of the others. She had a large frontal blaze, white skin 
on the forehead, and large areas of white skin on the front of the neck and 
chest, and in addition a very extensive area on the abdomen. 

Of the fourth generation the next child, IV. 13, was a male with dark hair and 
eyes, who had 5 normal children; the next, IV. 15, had fair hair and died young. 
Twins, IV. 16 and 17, came next and died in infancy*. They were heterogeneous, 


* The tendency to twin in this family is worth noting. 
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a dark-haired boy and a light-haired girl. A girl, IV. 19, was born next and she 
had twin sons, V. 15 and 16, who were also normal. The last three children, 
IV. 20—22, a girl, a boy, and one whose sex I am unable to ascertain, were all 
normally pigmented and all died at a very early age. 


The pedigree confirms the strongly hereditary nature of piebaldism, and in this 
as in other published cases the character can affect either sex, but has only been 
transmitted by those affected. Unless we are to assume that in the case of such 
a rare anomaly as piebaldism, I. 2, II. 2 or III. 1, were really unnoticed piebalds, 
then III. 2 could only be heterozygous, or since piebaldism is dominant a (DR). 
We must take IV. 4, IV. 9, IV. 11 and V.7 for pure recessives (RR). Thus the 
number of piebalds in the five sibships of generations IV, V and VI should be one 
quarter, Le. $(15+34+1+6+2)=7 nearly. We have actually 14 out of 27, 
thus piebaldism does not seem to act numerically as a pure dominant. 

The areas of unpigmented skin are less than in the classical piebalds, but it is 
probable that in some, at least, they are larger and more numerous than I have 
stated. On the covered parts of the body and legs, which I was unable to 
examine except in the baby, they would not be very noticeable. It was not until 
I had noticed the white skin on the neck and arms of one of them that I was told 
anything about the existence of similar patches on the others. If true, it is 
remarkable that none have had white patches on the legs. 


With regard to the local distribution of the pigment, there appears to be an 
excess at the edges of some of the unpigmented areas, as has been noted in other 
cases. In the case of other pale areas, the demarcation between them and the 
normal skin is very slight, and is probably due to the fact that they are not wholly 
unpigmented. ‘This remark applies especially to the forehead, which in some of 
them looks paler than natural, but not wholly devoid of pigment. 


In some the eyelashes are alternately white and brown, and in others the eye- 
brows are similar, and in one at least hairs growing on the scalp near the blaze 
are in some instances entirely without either diffused or granular pigment. This 
suggests that the skin beneath may show a deficient and irregular distribution of 
pigment. 

The most interesting feature is the occurrence in three members of the family 
of well-marked heterochromia iridis, a character which has been met with in 
members of a piebald family, but always independently of their piebaldism, never, 
as in this case, in true association with it. It proves conclusively that these cases 
are not congenital leucoderma. 

There seems to be no association of piebaldism and general lack of pigmenta- 
tion of hair and irides. Affected and unaffected members have been both fair and 
dark, but the fairest piebalds seem to have the most extensive frontal blazes. 

In the cases photographed the individuals were blonds and there has been great 
difficulty in getting a good photographic contrast of differences of pigmentation 
very noticeable in the living subject. 


Plate X| 


Biometrika, Vol. X, Parts Il and Ill 


“ATLULR pleqeig Jo valsipog 


*"|RULION O “pleqatg & "SIPLIL BIMMOTYDOIeJaAY 2& ‘punod pelq + 


oor € : 


+ 
Cle 1G 1 Oo © © 2 2:01.06 Se 2 eo 
hc Oat 2 yp + 4 : + + 
6-6-6 ©6010 16 © 8 0 210 0: 62 © © @ 22) 


> 


‘III 


Biometrika, Vol. X, Parts Il and III Plate XIl 


V. 3 and V. 4 as children showing their marked V-shaped frontal blazes. 
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Plate XV 


‘wo sisters with white frontal blazes. 
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V. 
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V. 9. Showing white forelock or blaze, 
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Right forearm of V. 9 showing white patches on posterior aspect. The photograph is untouched and it is difficult 
to bring out by photography the grades of pigmentation when the arm is untanned by the sun, although they 
are quite clear on actual inspection. 
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Large photograph of Y, 11 to show paleness of forehead and white hairs on inner half of eyebrows. 


CLYPEAL MARKINGS OF QUEENS, DRONES AND 
WORKERS OF VESPA VULGARIS. 


By OSWALD H. LATTER, M.A. 


Upon the front of the head of Vespa vulgaris certain yellow markings stand 
out conspicuously upon the otherwise black surface. Below the three ocelli 
and between the upper portions of the two compound eyes there is a median 
four-sided yellow patch, the “corona”; to the right and left of this, separated 
from it by a fairly wide interval, and occupying the bay of each of the compound 
eyes is a pair of elongated yellow blotches; while straight below the corona and 
between the lower portions of the compound eyes is a very conspicuous yellow 
area which “extends over the clypeus and down to the labrum or upper lip which 
hes between the two mandibles. This clypeal patch of yellow bears upon it a 
black mark which is subject to considerable variation. I distinguish in the queens 
and workers five chief types of this black mark: see diagrams on p. 202. In 
Type I a broad vertical black band extends right through the yellow patch from 
the top to the bottom; a little below its middle the band bears to right and left 
a pair of bluntly pointed and slightly upturned arms: the portion of the median 
band below these arms is somewhat narrower than that above. Type II is derived 
from I by suppression of the black portion below the transverse arms. In Type III 
the extent of the black colouring is yet further reduced by the absence of the 
upper half (or thereabouts) of the vertical band. In Type IV the lower part of 
the vertical band re-appears, but the width of all the components is very much 
less than in any of the preceding types. In Type V the component parts of the 
black marking cease to be in contact; the upper portion of the vertical band 
is interrupted by a broad belt of yellow; the two “arms” are separated from the 
lower part of what remains and from one another; while there is no black at 
all below these remnants of the “arms”—a feature recalling Types II and III. 
Types IV and V are however represented only by single individuals in the series 
examined. 


Between these main types certain intermediates occur. Thus some individuals 
have the black piece below the “arms” very narrow, approximating therefore 
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to II, but conforming to I if we take extension of the black right through the 
yellow as the criterion of I; such individuals are distinguished as 1+ II. Others 
again conform to II but possess a slightly darker stain on the yellow in the line 
where the distinctive lower black portion of I might occur; these are called I+ 1. 
Similarly, intermediates between II and III are recognisable: in II + III the top 
of the upper portion of the vertical black band is very narrow; while in III + II 
there is a mere stain on the yellow of this region. A single instance occurs of an 
intermediate between I and III (1+ III), where the vertical black band extends 
right through the yellow, but is much narrowed at its upper extremity. 


Front of head of V. vulgaris ¢. 


SD Ocelli 
Eye 
Yellow 
Patch tn Girsen 
bay of Eye 


(All parts left white are actually yellow.) 


Clypeus alone 


HF) (*) @) 


Types I Il III IV W 
4 VII) VII VIII (3 VII) VIII 


Drawn by K. W. Merrylies. 


My first examination consisted of about 200 tubes containing queens of 
Vespa vulgaris from different nests. In the case of some queens the heads 
were missing, and in the course of transit the contents of some of the tubes 
had got loose in the jar. I have numbered these 199 to 208. The results are 
given in Table I (p. 204) and the summary below: 
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Type I Pure Poh 
‘. I+II 11 62 
, 1+ Il] I 
Type II II+I] 29, 
5 Pure 94} 120 
55 II + Il 4 
Type III III + I a : 
4 Pure 3, C4 
III +1V 0| 
Type IV IV +I1ll 0 
a Pure | il 
2! IV+V 0 
Type V Vea Hi 1 
r Pure 1 
Total: 185 


It will be seen that transitional cases undoubtedly occur. The bulk of the 
queens, however, fall into Types I and II, or queens are very little variable. 


To test: (1) whether this variability was still further lessened by taking only 


the queens from a single nest, and (11) the relative variability of queens, drones 
and workers, I now examined all the queens, workers and drones of a single nest 


of V. vulgaris. 
In this case all the 127 queens were of Type II*. 


The classes of the workers are given in Table II (p. 205) and may be sum- 
marised as follows: 


Type I Pure 5) 10 
§ v.s or (I+II)? 5| 
Type II II+I 6 
“ Pure at Ae 
Total: 172 


It will be seen that they are somewhat more variable than the queens of the 
same nest, but not so variable as queens from different nests. 


T now turn to the drones of this same nest. JI had 150 at my disposal. 


The drones exhibit a very wide range of facial markings. In the material 
examined comparatively few fall into the scheme of classification adopted for 
the queens and workers, and it thus becomes necessary to resort to six types 
of face which appear to be peculiar to the male sex. These are numbered VI, 


* There were 129 queens in this nest, but No. 34 was missing and No. 98 had its head damaged too 
badly for classing. 
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VIG VID, VII, VII (4 VIID, VIII and IX, see diagrams, p. 202. In Type VI 
there are two somewhat elongated black dots upon the yellow clypeus, one being 
sub-central, the other on the ventral margin; in VI (4 VII) the ventral dot is 
longer dorso-ventrally and a third dot appears upon the left side (right side, in 
figure seen from in front) opposite the gap between the two previous dots; 


TABLE I. 


Types of Clypeal Markings in V. vulgaris Queens. 


No. | No. | No No. No. | 
| | 
| | 
| oe = 44 III 87 | 1+II | 180 = 173 Ill 
hee II Dials 88 II 131 II 174, | Tees 
| 3 | Il4I 46 II 89 ae a 132 II 175 II 
| 4 1 47 = 90 alge 133 Il WG ho WIESE 
ae ig alle AS. Wise Oe L 134 — 177 Tore 
6 at AQ) 3) > Sel 92 et 135 I Tey) ~ UE 
re |) IE BORE in ale 93) LEE in| 36 II 179 esi 
3 ail 51 if 949 i 137 II 180 II+I 
9 | II o2 II 955) ILE 138 I 181 II 
10 || I 5a el 96.09) 139 II 182 = 
11 Il+I 54a eee 97 I 140 I 183 | I1+II1 
12 II+1 ii) 10 98 dee 141 I 184 = 
Lea 9 al 56 md 99 |II+IIL} 142 Teen 185 I 
14 oe 57 IL 100 I 143 I 186 Il 
15 II 58 pa 101 ats 144 Il 187 =|) St 
16 II 59 I 102 Il 145 II 188 | I 
7 A eaeale 60) meet 103 I 146 I 189 I+II 
18 Il 61 IL 104 IU 147 II 190 II 
19 | II+I 62 II 105 Il 148 =e 191 I 
20) ae 63 II 106 a 149 i 192 I 
HS ee Al 64 | II 107 I 150 Il 193 III 
22a mel (ij) = TU 108 Il 151 II 193.4 II 
937 | 21 66 II 109 I 152 = 194 = 
OAgea eles 67 = TOKO} 7 |) S50 153 II 195 | I+III 
25 IL+I Con ay tll 111 | IE+T 154 1H 196 II 
26 I 69 eet 112 Il sys | AU 19 jee 
27 Il 70 113 I 156 Il 198 IIl+I 
28 | I a Il 114 II 157 Til 
29 TST 72 ] 115 |II+I0T} 158 IIl+I = - 
30 I a eal 116 II 159 Il 
31 II Mee letsel Has IU 160 I Loose | 
32 II Aven AT 118 I 161 I 
33 V 75 | II ie) |). HE 162 Il 199 I 
34 Tal M6 | SUL 120 |II+III} 163 I 200 I 
Si alipageeee i, ae 191. || 7 ale 164 II 201 I 
36 iat AGH), eae 129° | 165 II HO i) ULE 
37 I oe ee 122 -- NGS | IM 2O3ke ee lal 
38 II 80 ee 124. I Gy | 7 204 Il 
39 Il+I] Se eee 125 II 16S 205 II 
40 ete 82 - 126 elie es 169 |) »— 206 Il 
Alton lee 83 I NO Wel 70a) eae 207 Ll 
AN Oyl ad 845 i 128 | II 171 I 208s 
| 42 Il 85 128b Il 172 = 209) | sri | 
| 43 I 86 | TSU 129 SHI | | 
| 
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TABLE ILI. 
Types of Clypeal Marking of Workers of a single Nest of V. vulgaris. 
No. No. No. No. No. 
1 — 37 II 74 II 112 II 150 II - 
2 — 38 IT 75 II 113 II 151 II+I1 
3 — 39 II 76 II 114 II 152 II 
4 _- 40 II 77 II 115 II 153 IT 
5 — 41 II 78 — 116 II 154 IT 
6 — 42 II 79 I 117 II 155 II 
7 — 43 II 80 II 118 II 156 II | 
8 —- 44 II 81 II 119 II 157 Il | 
9 II 45 |Iv.v.s. 82 II 120 II 158 II 
10 II [1+11] 83 II 121 II 159 | II 
11 II 46.) I 84 II 122 II 160 II 
12 Iv. s. 47 II 85 it 123 II 161 II 
[I+11] ] 48 I 86.) 1d 124 I 162 | II 
13 II 49 —— 87 II 125 II 163 II 
14 II 50 IT 88 II 126 II 164 I+! 
15 I 51 II 89 II 127 II+1 165 II 
16 II 52 II 90 II 128 II 166 II 
17 II 53 II 91 II 129 Il 167 Il 
18 IT 54 II 92 II 130 II 168 IT 
19 II 55 II 93 II 131 II 169 II 
20 II 56 II 94 Il 132 II 170 II 
21 II 57 IT 95 IT 133 II 171 II 
22 IT 58 Iv. s. 96 II 134 II 172 II 
93. | Tf Aeim o7 fo = | 135 | mm} ia | 1 
24 II 59 II 98 II 136 II+I1 174 II 
25 I 60 II 99 II 137 II 175s I 
26 II 61 II 100 II 138 II 176¢" If 
27 II 62 lil 101 II 139 1H ied, II 
28 II 63 II 102 II 140 II 178 II 
29 II 64 II 103 IT 141 II 179 II 
30 Iv.s. 65 I 104 II 142 Il 180 Il 
f[+it)| 66 -| IL 105 | II ea iii [er 
31 II 67 II 106 II 144 II 182 II 
32 II+I 68 II 107 II 145 II 183 II 
33 II 69 I 108 II 146 II 184 II 
34 — 70 II 109 II+I 147 II 185 — 
35 IAs al II 110 II 148 II 186 — 
(I+1I] } 72 II 111 II 149 IT 187 II 
36 II 73 II 


in VII the two median dots are united by a slender black line, and there is a pair 
of lateral dots, right and left; in VII (4 VIII) the median line is of uniform 
width, extending from about the centre to the lower margin, and to its left side 
there is a single dot; in VIII the median line alone is visible, both lateral dots 
having disappeared; while in IX there is no continuous median line, but merely 
two black spots, one at the extreme dorsal and the other at the extreme ventral 
side of the clypeus. It will be noticed that Types VII—VIII approximate to 
Type IV in so far as the black stripe begins at about the middle of the clypeus 
and extends right down to the ventral margin. 
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The data are given in Table III (p. 207) and are summarised below: 


very narrow 6 
ape a A VII) if f 
Type II no horns 1, very narrow 1 2 
Type III : : : : : 0 
Type IV : : : : : 0 
Type V ; : : ; 0 
Type VI Pure oi 60 

3 Vale vali 1 
Type VII 5 : : . nee, 

Type VIII (VIII near VI) 8 
F (VIII + $ VIT) 27 58 

, Pure 48 
Type IX : : : : ‘ 2 
Total: 151 


It will be realised at once how far more variable the drones, of even one nest, 
are than the workers or queens for this character. But their variability is rather 
of a negative than a positive character, appearing to consist in more or less extensive 
absence of the fuller markings of queen and worker. 


The results here deduced for variability of non-measured characters do not 
wholly agree with those found by Wright, Lee and Pearson on the wing measure- 
ments of the same nest of V. vulgaris. They found that for absolute measurements 
the variability as determined by the coefficient of variation was in every case such 
that the worker was more variable than the drone and the drone than the queen. 
On the other hand they found when they dealt with zndzces that the drone for 
wing measurements was slightly more variable than the worker and the queen 
less variable than either*. Possibly the divergence apparent here may be explicable 
in tbe sense of the drone’s variability lying in the present case in an absence 
of marking rather than in any positive variation. The drone’s variation is about 
a centre of much dimimished marking. If we could measure the variation in the 
total area of marking in queen and worker we might find it as great as the varia- 
tions in the smaller markings of the drone. 


It would be of much interest to investigate a series of drones from different 
nests. It is clear that the clypeal markings form a secondary sexual character 
and they would probably provide classifications for hereditary purposes. 


* Biometrika, Vol. v. pp. 414 and 421. 
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TABLE III. 


bo 


Types of Clypeal Markings in Drones of a single Nest of V. vulgaris. 


No. No. No. No. | No. 
| 
1 VIII 33 VII 63 VII 93 VI 123 VIII 
near VI} 34 VI 64 Wi 94 VI 124 I 
2 VI 35 VIII 65 VII 95 VI narrow | 
3 VIII 36 VI 66 VIII 96 VII 125 VI 
4 VIII 37 VI 67 I 97 il 126 VIII 
5 VI 38 VII very very 127 VII | 
6 VI 39 VI narrow | narrow | 128 VII | 
7 VIII 40 VII 68 VI 98 | VIII 129 VIII 
8 VIII 4] VI 69 VIII 99 VIII 130 VIII 
9 VIII 42 VI 70 VI 100 VI 131 VI | 
10 IX 43 VIII Hil II 101 | VI gy VIII | 
11 VI 44 VI but no 102 AVIA 133} VI 
12 VIII 45 VIII horns 103 VIII 134 VI 
13 VI 46 VIII 72 VIII near VI} 135 VI 
14 Vall 47 VI Wp} VI 104 VI 136 VIII 
15 VIII 48 VIII 74 VIII 105 VIII 137 VI 
near VI} 49 VII 75 VIII 106 VII 138 I 
16 VIII 50 VIII 76 VIII 107 VIII dots of 
17 Vi 51 VII 77 VI 108 VIII VII 
18 VI 52 VII 78 VI 109 VI 139 VII 
19 VIII 53 VIII 79 I 110 VI 140 VI 
20 VII 54 VI narrow Wadi WAL 141 VI 
21 VII 55 I 80 VI 12 VIII 142 VIII 
22 VIII very 81 VI 13 VI 143 VIII 
23 VIII narrow | 82 VIII 114 VIII near VL 
near VI} 56 IX near VI} 115 VI 144 VI | 
24 VII 57 I 83 VIII 116 VI 145 VIII | 
25 VII very 84 VII 117 II near VI 
26 VI narrow | 85 VI very 146 VIII 
De VI 58 VIII 86 VI narrow 147 VI 
28 VI near VI} 87 VIII 118 == 148 VIII 
4 VII 59 VIII 88 VII 119 VII 149 Vali 
29 VIII 60 VIII 89 VI 120s ee Vall 150 VI 
30 VIII 61 VIII 90 VIII 121 VI 151 VI 
31 VI 62 VIII 91 VII 122 VI 152 VI 
32 VIII iVII 92 VII 
4 VII 
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TABLE OF THE GAUSSIAN “TAIL” FUNCTIONS; 
WHEN THE “TAIL” IS LARGER THAN THE BODY. 


By ALICE LEE, DSc. 


In a paper published in Biometrika, Vol. v1. pp. 59—68, tables for the’ in- 
complete normal moment functions were printed, and they have since been 
reproduced in Tables for Statisticians and Biometricians recently issued from the 
Cambridge University Press. From these tables values of the Gaussian “Tail” 
functions were deduced and a short table of yy, and yy, appeared in Biometrika, 
Vol. vi. p. 68. The value of these functions being demonstrated in practice 
during the last few years, a more complete table of yh, W., Wy; has appeared in the 
Tables for Statisticians and Biometricians. 


In the introduction to those tables, however, Professor Pearson indicated that 
it was important to have a similar table when the “tail” forms more than half the 
entire curve, and gave the fundamental formulae for obtaining the numerical 
values of the functions. The present table has been calculated to supply the 
want thus indicated. 


10 


ff 
y 


Li 


E B oO H Cc 


Let the figure represent a Gaussian curve of total population WV and standard 
deviation o. Let AB be the ordinate at which it is truncated and let 
OB = haxaor 
Let GH be the ordinate through the mean G of the truncated portion and BH =d, 
the distance of the mean from the line of truncation, let } be the standard 
deviation of the truncated portion about GH, and n=the area of the truncated 
portion, or of the population observed. Then if any material be supposed to 
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form a truncated portion of a normal curve, d, n and = can be found (see Tables, 
pp. xxvii and 25). 


We have Apt ORY Sccaids oe str gh ssa oj Sow ee din «ies son (i), 
Arn 10) pte sc rmas enee nice Se ncnen tageaaGs Antes (11), 
Aira OV Ute cee cranes Gacis cuties aihsielstas vas b,c (iil) 


These are tabled for each value of h’, at first proceeding by (01 and then by ‘10 as 
unit. Now y, being known we find h’ from the table, and hence deduce yy, and 
3. 2 gives us the value of o from known d. Hence h=h’ xo can be found, 
lastly (111) gives us the total population from which n is drawn. Thus the constants 
NV, o and h which fix the total Gaussian are determined. 


It will be sufficient to illustrate the method of using the tables on certain data 
as to the English thigh-bone, recently published by Parsons*. 


Dwightt+ has adopted a method of sexing human femora on the basis of a 
markedly bimodal distribution obtained by him for American bones. He terms 
female any femur with diameter of head less than 45 mm., and male any femur 
with diameter of head over 47 mm. Parsons follows this rule and sexes by other 
points femora with heads from 45 to 47. As unsettled remainder he has 20 femora 
of 45 mm. and he gives 12 to ? and 8 to #3; of 46 mm. and 47 mm. he has 41 
femora and he gives 4 to ? and 37 to f. As a result of this process he obtains 
a female frequency curve which rises very abruptly at high values of the diameter, 
and a male frequency curve which rises very abruptly for low values of the femora. 
But, if there really be any marked skewness in frequency of the parts of the 
human skeleton, which is very unusual, we should anticipate that it would be of 
the same sense. Parsons’ distributions are as follows (loc. cit. p. 256): 


50 | 51 | 52 


| 
36 | 87 | 38 | 39 | 40 | 41 | 42| 48 | 44.| 45 | 46 | 47 | 48 | 49 


1 1/—| 3 |] 8 |] 14; 12) 18 | 12] 12} 3 
8 | 8 


a 1 can 
99) | 1741 Si || 19 i 10 


bo 


The ¢ 48 mm. femur according to.the rule should have been treated as a 
male but presumably it had marked female characters. Were there no marked 
male characters in any bone below 45 mm.? It will be seen that there is a 
remarkable dip in the total material at 46 mm. which corresponds to Dwight’s 
division. In material measured six years ago in the Biometric Laboratory, where 
every bone in a relatively large series was measured, no such dip occurs and there 
is in those data no justification for Dwight’s method of sexing. The group of 
29 § bones at 47 mm. and the sudden cut off at 45 mm. seems to condemn this 
method of sexing, at “ay rate from the statistical standpoint. 

* Journal of Anatomy and Physiology, Vol. xiv. pp. 238—267. 


t+ American Journal of Anatomy, Vol. tv. p. 19. 
+ This material has been statistically reduced and will shortly be published. 


210 Table of the Gaussian “ Tail” Functions 


Without arguing this point out here, we may illustrate the use of the Table 
(p. 214) of w’s by taking two of Parsons’ frequency distributions for females; we 
will cut them off at the points suggested, and then investigate the total popu- 
lations of females which result. Our author pools for these distributions right 
and left bones. 


Taking the diameter of “head of femur” for the females, we have 


| | 
| Diameter in mm. ... | 36 | 37'| 38 | 39 | 40 | 41 | 42) 43) gL 
| | 


Frequency ... sohe |Past 1/—]} 3] 8 | 14 12 | 18 | 12 


These are exactly the bones the Dwight process gives as female. We find 


>? = 28851, 
d =2°6159 (measured from 44°5). 
Hence wy = S2/d? = 4216. 


Whence by interpolation from the table 
h’ ='782, W,='864, W;= 1-278, 


leading to o = 2:260, he edGn, 
and Mean = 42°73 mm., NV =88-2. 
Parsons gives for R. femur, Mean = 43, 


L. femur, Mean = 42, 


and the total number of bones dealt with 55 + 48 = 1083 (Tables, loc. cit. pp. 249— 
251). In his frequency distribution (p. 256) he only records 85 female bones, 
which give a mean of 42°54 and a standard deviation of 2078 mm. These values 
are clearly not widely divergent from those we have found above by supposing 
all bones under 45 to be female. 


To test the matter further the 105* female bones of which the head was 
measured by Parsons were taken out. They provide the distribution : 


how | 

| Diameter in mm. ... | 36 | 37 | 388 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 

Sil 

Frequency... ... | 1 | 1 | 4 | 109)18' | eulPotalarsiie4 3) == leer 
These give Mean = 42°47 . 
3 Dis 1-996; = 105, 


* It is not possible to say whether he has omitted two queried measurements. He has not omitted 
bones he queries in breadth of lower articulation. 


| 
| 


| Frequency 
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Cutting off all bones over 45°5 we find 
>? = 26392, d=2-6264, 
leading to vr, = 3826. 
Hence bh’ ='984, WW. ="783 and y,= 1195. 
These provide for the non-truncated population, 
Mean 42°48, s.D.=2:056, N=104, 


which are in still better agreement with Parsons’ constants for the 105 bones than 
the constants for the 85 bones were for their series. It would appear therefore that, 
if we suppose all bones under 45 female and use our Tables, we get results in 
reasonable accordance with Parsons’, and possibly by a theoretically more justifiable 
method than endeavouring to sex the bones above 44 and below 48 from other 
characters. 

We have considered from the same aspect the character breadth of lower 


articular end of femur. Parsons’ distribution of 89 female femora is as follows 


(p. 257): 


i | | 
q . ; Ay SPL yee ya ae llie a wh | yw Prpiliay re 
| Breadth in mm. ... | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 69 | 70 | 71 | 72 | 73 | 74 
| | | | 
| 


a : SeEeat 


In this Table he has only one bone in excess of the numbers on which he 
bases his means on pp. 250—1. If we truncate at 69'5, Le. reject all bones over 
69 mm., we find 

2?=3'9803, d=2:9058, 
and pr, = "4714. 


Hence we deduce 
h’ =5295, We= 977, W,=1426. 
These lead to 
h=1:503, o=2:839, N=984, Mean=68-00 mm. 
The actual values given by Parsons’ distribution above are 
o=2571, N=89, Mean=67'54 mm. 

Thus the agreement is not nearly so good as for the diameter of the head ot 
the femur, being about 10°/, wrong ino and NV. It should give as good a result 
if the method were quite satisfactory, for the bones have been sexed by the 
diameter of the head, and the limit 44 mm. for diameter of the head corresponds 
fairly closely to 69 mm. for the breadth of lower articulation. 


As this paper is not intended as a discussion of Parsons’ data, to which we 
hope again to return, we will only deal with one more illustration of the use of 
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the Table. We take out from his Tables, pp. 244—248, the diameter of head of 


femur for 174 male bones. 


ee 
Diameter of Head in mm. ... | 45 | 46 | 47 | 48 | 49 50 | 51 | 52 | 58 | 54) 55 
=: Teas (sae 
Frequency ... pb pee ||) c 33 | 17 38 | 20 18; 12; 6 | 8] 3 
| 


The constants of this distribution are 
Mean=49'14, o=2377, N=174. 
Truncating at 47°5 we have 
>? = 3'4341, d=2°7869, 
whence yw, = 4422, and from the Table 
h'='679, W.=°908, wW,=1°331. 

These lead to h=NT132 

and Mean = 49°22, o=2°530, N=162, 


Le. to a “tail” of 40 not one of 52 below 47°5. Actually this tail distributes itself 
as follows: 
Onder 45 45 46 Av 
Gaussian tail 5 6 12 17 
Against Parsons’ 0) 9 10 33 


This confirms our previously expressed view that probably a considerable 
number of the bones classed as 47 mm. are really female femora, and that the 
male distribution runs considerably beyond 45 mm. into the range treated as 
purely female. 

Finally let us try the result of pooling male and female bones and breaking up 
the composite frequency by the method of Phil. Trans. Vol. 185 A, p. 84. 


We have now 279 bones distributed as follows: 


ane ; | | ie | 
HER UES ser 36| 37 | 38| 39 | 40 | 41 ca 43 | 44| 45 ale 47 48| 49 |50| 51| 52| 53| 54| 65) Total 
ea ke 
oe = 
Frequency {1} 1]—/ 4 [10 Bee) 13/33/18| 3820/18/12! 6 | g|3| 279 
| 
The constants are Be = 46°63, s.D.=3°93, 


by = 15°4040, bs =— 67791, 
= 5415162, ws = — 5380°5339. 
The nonic is 
gq.’ — 596189," + '0689¢.5 + 9°83579q. — 3:°4275q.' — 8: 2041¢,? — 30209," 
+ :0144g, - 0097 = 0, 
giving the root q,=— 934 and p,=— 9°34, 


ALICE LEE 213 


and ultimately the two components : 


Male Female 
Mean ‘ 2 : : f 49°83 43°72 
Population : ; ’ 133°25 145°75 
Standard Deviation . d : 2B 2°662 
Max. Ordinate . . : : 23°83 21°84 


While the means agree roughly with those obtained by Parsons’ sexing (49 and 
43), we see that this analysis much more nearly equalises the number of male and 
female bones, and indeed makes the female population rather larger than the male, 
while Parsons has 79 °/, more males. The “truncated tail” method would probably 
give results in better accordance with the present had we not truncated at the 
quite arbitrary Dwight-Parsons’ divisions. 

These examples may suffice to illustrate the application of the Tables to 
anthropometric measurements on man, where we can feel fairly confident that 
the material, if sufficient in quantity, would be adequately described by a Gaussian 
or normal distribution. Such cases may arise when material for the two sexes, 
or for two races, is commingled and we can be fairly certain that one or other or 
both “ tails” of the material present homogeneous parts of the mixture, 


Another illustration drawn from Galton’s data for American trotters will be 
found in the Tables for Statisticians, p. xxvi. The chief weakness of the method, 
besides the assumption of the Gaussian, often quite legitimate, is the absence as 
yet of the values of the probable errors, which values must be very considerable 
for slender material such as that used above. 


See following page for Table of Gaussian “Tail” Functions. 
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Table of Gaussian “ Tail” Functions, “ Tail” larger than “ Body.” 


W aT (—) Ay Yo (—) Ape Ps (-) Ay h’ 
F OF aDe “f) 
00 | BS |. 008 le Bee coogi mh 200) (| oer samme 
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08. | penugl” SOEs" | soso eee maa Ose eg ee 03 
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See Zables for Statisticians and Biometricians, Introduction, p. xxvii. 
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I. Inrropuctory*. 


In the summer of 1905 Professor J. W. H. Trail drew my attention to an 
extraordinary example of variation which occurred in the several organs of the 
flowers of Lepidium Draba Linneus. At that time I examined in detail 1832 
individual flowers taken from a single plant growing in a piece of uncultivated 


* Tam pleased to have this opportunity of expressing my great indebtedness to Dr J. F. Tocher for 
invaluable assistance in the biometric part of this paper. The correlation and other constants were 
calculated in his laboratory, and without his assistance the publication of this paper would have been 
greatly delayed. I must also thank Professor Karl Pearson, in whose department in University College, 
London, the statistical study was originally undertaken, for reviewing this paper for publication and also 


for much kindly criticism and advice. To Professor Trail my thanks are also due for many botanical 
hints. 
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ground in his garden, and the results of these observations form the basis of the 
present contribution to the study of the variation in the Crucifere. 


Botanical problems, which have been hitherto attacked from the biometric 
standpoint, have been comparatively easily handled, because the material has 
been more or less homogeneous in character. For example, variations in the 
number of sepals of Anemone nemorosa* or in the number of ray-florets of 
Chrysanthemum leucanthemuwm+, and the consequent distribution of these are 
capable of direct treatment by Pearson’s well-known method of fitting frequency 
curves, 


The only work comparable to the one in hand occurs in Biometrika, Vol. It. 
p. 145 (Variation and Correlation in the Lesser Celandine), but in this case the 
numbers of members in the calyx, corolla and androecium have been examined 
as a basis for a study of homotypic correlation and in this flower each of these 
organs consists of a single constituent with numerous members. 


The problems studied in this paper, however, are more complex inasmuch as 
they deal not with one organ of the flower but with all the organs, their con- 
stituents and members both separately and collectively. 


It is also, I believe, the first biometric work of its kind on a cruciferous flower 
and embodies a study of chorisis, that is, “the splitting up or division of one or 
more components of a flower into two or more equal or unequal parts ”—a factor 
which is supposed to have been of the utmost importance in the evolution of the 
natural order—Cruciferee. A complete discussion of this phenomenon is reserved 
until the flower is studied in detail. 


It would be well here to emphasise the fact that the flowers examined for this 
study were not taken from different plants but, on the contrary, were obtained 
from several inflorescences growing on stems which had arisen from buds on the 
roots of a single parent plant. This mode of reproduction is rather unusual, but, 
in the present instance, is of particular interest imasmuch as it gives greater 
homogeneity to the material. 

The parts of the flower which have been considered are (a) the perianth, which 
consists of (1) the calyx and (2) the corolla, (b) the andrcecium and (c) the 
gynecium. 

The functional differentiation of these organs is of great importance in the 
interpretation of results so that it might be well to recall the particular réles 
which these play in plant economics. 

The gynecium and the andreecium are respectively the female and male organs 


of reproduction and consist of carpels and stamens, while the perianth forms a 
protective covering for these delicate structures. The calyx or outer organ of 


* Yule, Biometrika, Vol. 1. p. 307. 
+ Biometrika, Vol. 11. p. 309 et seq. 
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the perianth is concerned solely in the protection of the flower in the bud, but 
the corolla, in the open flower, also serves, along with the honey-secreting sacs at 
the base of the stamens, as an attraction for insects. 


The characters which have been taken as a basis for this study are numerical, 
eg. the number of petals in the corolla, but no measurable characters, e.g. the 
length and breadth of the petals, have been considered, although, as will be pointed 
out later in connection with possible future studies in this flower, these characters 
might also with advantage be taken. 


The Crucifere, as an order, are usually regarded by botanists as being very 
definite in type and no observations have been recorded to show to what extent, 
if any, deviation from the recognised botanical floral formula exists, so that the 
main object of this paper was to determine the frequency of the variability of 
the parts of the various organs and constituents, and also the degrees of correlation 
existing between the organs themselves. 


The mode of observation is worthy of remark, however, as it might well be 
argued that if the flowers used for examination were fully “blown” deficiency in 
the number of parts might be due to post-developmental fracture, but in all the 
cases here recorded the observations were made on flowers in bud or only half open 
so that the influence of wind or other external agency is altogether discounted. 
The material was also examined microscopically in all cases so that there should 
be no possible doubt as to the exact origin of any member. The importance of 
this will be seen in the details of the analysis. 


II. Bovranrcat. 
1. Specific characters. 


The generic and specific characters of Lepidiwm Draba may be obtained in any 
complete systematic botanical work so that it is unnecessary to repeat them here, 
but a few notes bearing especially on the study in hand may be of value. 


It is a perennial about a foot in height and is covered by a minute down from 
which its popular name, the hoary cress, is derived. The inflorescence is a raceme 
not much lengthened and so forms a broad, almost flat, corymb-like termination. 
The individual flowers are small, white and numerous. The constituents of calyx, 
namely the sepals, are green; they are short, nearly equal and bear no pouch at 
the base. The petals are small and white; they are equal in size, obovate, 
undivided and generally stalked. The stamens are six in number; the filament 
is simple, 1.e. it bears no appendages, and is shorter than the petals; the anther 
consists of two roundish lobes. The pods are “ broader than long”; they are com- 
pressed laterally at rght angles to the narrow partition. The thick valves are 
boat-shaped and sharply keeled but not winged; each valve contains a single seed. 
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2. Morphology of the flower. 
The typical flower consists of six whorls, made up in the following manner: 
(a) Calyx 2. 
(b) Corolla 1. 
(c) Andreecium 2. 
(d) Gynecium 1. (Plate I, fig. 1.) 


(a) Calyx. This organ is composed of two whorls each consisting of two 
sepals. The outer pair arise at one level on opposite sides of the flower and are 
inserted on a slightly lower plane than the inner pair; they are parallel to the 
plane of compression of the gynecium. The inner pair are also situated opposite 
one another but in a plane perpendicular to that of the outer pair; they are thus 
at right angles to the plane of compression of the gynecium. These whorls are 
denoted on Plate I, fig. 1, by the Roman numerals I and IT respectively. 


(b) Corolla. This organ consists of four petals all inserted at one level and 
alternating with the position of the sepals; they thus constitute a single whorl. 
(See III, Plate I, fig. 1.) 


(c) Andrecium. Six stamens form the andreecium ; they arise at two different 
levels and thus constitute two separate whorls. The outer whorl, which is lower 
down, consists of two stamens which are shorter than the others and corre- 
spond in position to the inner sepals. The inner whorl consists of four stamens, 
arranged in pairs which correspond in position to the outer sepals. (A reference 
to the figure (Plate I, fig. 1) in which the two whorls are marked IV and V 
respectively will make this clear.) 


(d) Gyneciwm. This organ consists of two carpels forming the sixth or 
innermost whorl. (See VI, Plate I, fig. 1.) 


It will be seen from the foregoing description that the order of the six whorls 
here detailed is that in which they would be found were we to strip the flower of 
its components at the different levels consecutively from below upwards. It is 
also the order in which we would find them, passing from the outside to the 
centre, were we to cut a transverse section through the flower. 

Another point, however, which is not so obvious but one which has special 
interest in our study, is the fact that this is also the order in time of development. 

The actual sequence in which these constituents of the flower appear in the 


bud is therefore : 
I. Outer whorl of Calyx (Sepals). 


II. Inner whorl of Calyx (Sepals). 

III. Corolla (Petals). 

IV. Outer whorl of Andreecium (Stamens). 
V. Inner whorl of Andrcecium (Stamens). 


VI. Gynzcium (Carpels). 
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3. Conception of Chorisis. 


Chorisis or reduplication is generally looked upon by botanists as a means of 
multiplication of the parts of a flower. It consists in the division or splitting 
of an organ in the course of its development by which two or more organs are 
produced in place of one. Chorisis may take place in two ways: 


(1) transversely—when the increased parts are placed one before the other, 
that is, the resulting components are on the same radius ; this is known as vertical, 
parallel or transverse chorisis ; 


(2) collaterally—when the increased parts stand side by side, that is, on the 
same circumference. 


Transverse chorisis is supposed to be of frequent occurrence ; thus the pistils 
of Lychnis and many other caryophyllaceous plants exhibit a small scale on the 
inner surface at the point where the limb of the petal is united to the claw. The 
formation of these scales is supposed by many to be due to the chorisis or unlining 
of an inner portion of the petal from the outer. 


Collateral chorisis is seen in different natural orders. In Strephanthus, in place 
of two stamens there is sometimes a single filament forked at the top and each 
division bears an anther. This is usually supposed to be due to collateral chorisis 
arrested in its progress. 


The flowers of the Fumitory are also generally considered to afford another 
example of this type of chorisis. In these we have two sepals, four petals in two 
rows and six stamens, two of which are perfect and four more or less imperfect. 
The latter are said to arise by collateral chorisis, one stamen being divided into 
three parts. 


Collateral chorisis may be compared, according to Bentley, to a compound leaf 
which is composed of two or more distinct and similar parts. 


Let us now consider chorisis in its bearing to the flower under consideration. 
In the description of the morphology of the flower we noted that in the inner 
whorl of the andrcecium there were four stamens arranged in pairs while in the 
outer whorl there were only two stamens situated singly. Various opinions have 
from time to time been advanced to explain this anomalous structure so that it 
might be well to briefly review these. Of the andrcecium of the Cruciferze Oliver 
says: “The two pairs of long stamens are generally thought to be due to chorisis 
or the division in the course of development of single antero-posterior stamens. 
Others have thought that the six glands represent abortive stamens and that these 
with the six stamens make up a normal series of twelve in three whorls.” 


De Candolle held the view that the stamens formed a single, originally 
tetramerous whorl alternating with the petals in which the median members, 
Le. the anterior and posterior, were cleft (chorised) in two. Since however the 
lateral stamens are inserted lower down than the median stamens and are also, 
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as already pointed out, formed earlier in the bud, this view is clearly untenable. 
Two whorls must be taken into consideration owing to the difference in the levels 
of insertion, the single stamens being lower down. Kunth, Wydler, Chatin and 
others regard these two whorls as typically four-membered (tetramerous), those of 
the outer whorl corresponding in position to the sepals, those of the inner whorl 
corresponding in position to the petals. To arrive at a typical cruciferous flower 
from this, two stamens in the outer whorl abort, while the individuals of the two 
pairs of the inner whorl come together. (Plate I, fig. 3.) 


Others (Krause, Wretschko and Duchartre) regard the outer whorl as typically 
dimerous (i.e. with two constituents) and the inner whorl as typically tetramerous 
(i.e. four-membered). 


The more modern view, however, regards both whorls as dimerous but the 
inner one chorised collaterally thus giving the typical cruciferous flower. 


The reasons put forward to support this theory are as follows: 


(1) The upper long stamens are usually paired in the median line, also 
sometimes coherent. Further, in place of one or both of the pairs, there occurs 
sometimes a single stamen—a hint at reversion, or one or both pairs may be 
replaced by three or more—a suggestion of further chorisis. 


(2) In the earliest visible stage of development in the bud it may be seen 
that each pair of stamens arises from a single wart-like projection and that division 
is therefore a secondary result. This is not very easily demonstrable in the 
Cruciferee but is more evident in a closely allied family, the Capparidacee. 


Since the present study includes numerical variation in the different constituents 
and positions of the andrcecium it will be interesting to note to what extent any 
one of these theories is borne out by the variations in this flower. 


4. Orientation of the flower. 

Having defined the positions of the various stamens relative to one another, 
in what is usually regarded as a normal cruciferous flower, let us now consider the 
different possibilities when the flower is abaormal. 


Suppose that one of the pairs of stamens of the inner whorl is represented by 
a single stamen, that is, suppose that chorisis had not taken place. Now with 
regard to the peduncle of the inflorescence this stamen might be placed in two 
diametrically opposite positions, namely (1) it might be adjacent to the peduncle 
(Plate I, fig. 4) or (2) it might be on the distal half of the flower with reference 
to the peduncle (Plate I, fig. 5). 

Two questions now arise, (1) do non-chorised stamens occur as frequently as 
chorised stamens on the side of the flower next to the peduncle? or (2) do either 
of these occur with greater frequency in this adjacent position ? 

According to which of these questions is answered in the affirmative must we 
conclude whether there is any connection or correlation between the proximity of 
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the chorised stamens to the peduncle and chorisis. The former would suggest no 
correlation, whereas the degree of correlation hinted at by the latter would depend 
on the frequency of the occurrence. 


We have so far considered only two possible positions, viz. a non-chorised 
stamen adjacent to the peduncle, i.e. in the proximal half of the flower, and a 
non-chorised stamen opposite to the peduncle (i.e. in the distal half of the flower 
with reference to the peduncle), but the question naturally arises “ Are these the 
only two possible relative positions which might occur?” Might the petiole not 
twist so as to bring the hypothetie non-chorised stamen into any position varying 
from 0° to 180° with reference to the original plane ? 


Let us illustrate this by means of the Figure 6, Plate I. 


Taking the position of the peduncle as our fixed point the non-chorised stamen 
might occupy the “adjacent” position a or the “opposite” position al. A rotation 
of the petiole, however, might cause this stamen to occupy any of the positions 
marked a2, a3 or a4 or even any intermediate position between a and al on either 
side of the vertical plane A—B, in the horizontal plane a, a4, a2, a3, al. 


In a study of the variations in this flower, this is precisely what was found to 
occur, i.e. the distribution was equal round a fixed point so that we are unable to 
say whether there is any connection between the proximity of the non-chorised 
stamen to the peduncle and chorisis or not. 


But the full bearing of this consideration does not end here. The orientation 
of the flower is of practical importance in fixing a basis on which to establish 
a grouping of the different variations. Any analysis of the data is impossible 
unless some definite part of the flower be agreed upon as a starting point. 


Now we have seen that the position of the peduncle with respect to any 
definite stamen does not require to be taken into consideration. Consequently 
we may take either of the two stamens of the outer whorl, which correspond in 
position to the outer sepals and which are “normally” non-chorised, as our fixed 
point and call it 1; the stamen opposite, i.e. in the same whorl, we shall call 2; 
the chorised pair of the inner whorl to the left (or in the floral diagram above) 
may be termed 3 and 4; while the corresponding pair to the right (or in the floral 
diagram below) would thus be 5 and 6 (Plate I, fig. 7). 


Where variations occur in any of these stamens we shall hereafter refer to 
those as occurring in “position” 1, 2, 3, 4 and 5, 6 respectively. 


On this basis of symmetry, it will simplify matters considerably if we regard 
as 1, in flowers in which either of the two outer stamens is modified, that one 
which still maintains its original character while, on the other hand, if both are 
modified, that one which retains the greatest approximation to normality, e.g. if 
one be chorised while the other is not, the latter would be in position 1; or if one 


Biometrika x 29 


222 


Contribution to a Statistical Study of the Crucifere 


were chorised while the other was only partially chorised* the latter would again 
be in position 1. 


Following on this it is at once seen that where both are normal or where both 
are equally abnormal it makes absolutely no difference which position we choose 


as. 


TEI: 


1. Classification. 


EXAMINATION OF THE DATA. 


Considerable difficulty was experienced in classifying the variations owing to 


these occurring in so many different forms yet with so few characteristics in 
common as to warrant their inclusion in definite classes. 


The total number of flowers examined was 1832, of which 1062 had the 
accepted normal structure (see page 218). The remaining 770 showed variation 
in different degrees of advance or regression, i.e. there was an excess or deficiency 
in the number and structure of the members of the various organs. Thus we see 
that there was a deviation from the accepted normal structure in over 42 per cent. 
of the individuals examined. 


The perianth has been selected as a basis for classification and Table A shows 
the sub-divisions which have been adopted. Amongst those flowers in which the 


TABLE A. 
Number Number Number Number | Variations 

of in in in in the 

Variations Group Sub-class Class Class 
Class I. Perianth normal hele — — — 1687 = 
Sub-Class A. Gyneecium normal -— _ 1680 — a 
Group (a). Andrecium normal 1 1062 — — — 
Group (6). Andrecium abnormal 57 618 — — — 
Sub-Class B. Gynzecium abnormal — -- 7 — = 
Group (a). Gyneecium one carpel 2 4 — — = 
Group (6). Gyneecium reduplicated 2 3 —_ -- 62 
Class II. Perianth abnormal at ee be — — os 115 — 
Sub-Class A. Calyx normal, corolla abnormal — — 55 — ~ 
Group (a). Gynzecium normal iil 54 - _ = 
Group (0). Gyneecium a single carpel : 1 — = 
Sub-Class B. Both calyx and corolla abnormal — — 60 — = 
Group (a). Gynzecium normal : il 46 — — = 
Group (b). Gyneecium a single carpel 6 14 — — 29 
Totals 91 1802 1802 1802 91 


* For the present we use the terms ‘‘chorised” and ‘‘chorisis” in the sense of the definition already 


given. 
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perianth was normal there were no fewer than 62 different types of variation, and 
amongst those in which the perianth showed a departure from the accepted 
normal structure there were 29 types of variation. Thus of 1802 flowers examined, 
1062 had the typical cruciferous structure, 625 had the perianth normal but the 
andreecium and gynecium modified in 62 different ways and 115 had all three 
organs modified in 29 different types of variation. 


The remaining 30 individuals are not capable of classification under the fore- 
going scheme but have been grouped into three classes as shown in Table B. 


TABLE B. 
Number Number 

of of individuals 

| Variations | in the Class 
Class III. Reduplication of parts but flowers not separate ... 10 11 
Class IV. Reduplication of parts with flowers separate... 6 17 
Class V. Part of a flower replaced by a flower see oe 2 2 
Totals... se ae a ae aes oh 18 30 


Altogether, therefore, there are five separate classes which give a total of 109 
different modes of variation. 


2. Analysis. 


In the further reduction of the data it is essential that we consider the 
variations in the stamens, and for this purpose we must naturally commence 


with Class I, Sub-class A. 


To avoid describing each of these in detail, it is necessary to have recourse to 
a graphic method of representation. Several such methods suggested themselves 
and although none are ideal we have chosen one which may help to give a true 
impression of the various modifications assumed by the andrecium. We shall 
also give a few examples by another method which might have been adopted but 
which seems to us to be even more complicated. 


Let us, in the first place, consider in what directions abnormalities have 
occurred. A typical stamen consists of two parts, (1) the filament and (2) the 
anther. 


(1) Filament. This may be of its normal length or less than its normal 
length or altogether absent. 


(2) Anther. This may be present or absent. 


But other complications arise. As already explained, in the accepted typical 
cruciferous flower, chorisis has taken place in positions 3.4 and 5.6 so as to give 
29—2 
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rise to two stamens in each of these positions. Now, we find that, in certain 
flowers chorisis has only partially taken place and in others it has not occurred 
at all so that we have thus another three possibilities to consider. 


In describing the androecium, therefore, we must (1) define the position of 
each stamen to which we refer, (2) state the nature of the filament, (3) note the 
presence or absence of the anther and (4) emphasise the nature of the chorisis. 


Let us use the following symbols 1, 4 and 0. 


Filament, indicates that it is present and complete. 
1 with reference to 4} Chorisis, indicates that it is total or complete. 
Anther, indicates that it is present. 


jeulaetent, indicates that it is only half-length. 


4 with ref oe Suleesee se : 
g Wim retenence to (Chorisis, indicates that it is ouly partial. 


( Filament, indicates that it is absent. 
Chorisis, indicates that it has not taken place. 
Anther, indicates that 1t is absent. 


0 with reference to 


We have already fixed upon our nomenclature for the various positions; these 
are 1; 2; 3.4; and 5.6. To avoid descriptions and at the same time give a 
graphic representation of the floral formula of the andreecium the following 
system might be adopted: 


(1) Place the whole floral formula within square brackets thus [ _ ]. 
(2) Place positions 1; 2; 3.4; and 5.6 within curled brackets thus { }; and 
(3) Place individuals, i.e. 1, 2, 3, 4,5 and 6, within rounded brackets thus (_ ). 


Expanding this with reference to a normal flower we would have for the 
andreecium only 


[12} {2} {2)], 


or still further in the order of Filament, Chorisis, Anther, Stamen 


(1.071) G.0. 1) (1.1), Ge ea Gea i 


Or, taking an actual example from our data : 


Stamen number 1 is normal and complete, and there is no chorisis; stamen 
number 2 has a filament only half-length but the anther is present and complete, 
and there is no chorisis; stamen number 3 is normal and complete; stamen 
number 4 is only half-length but with a complete anther—chorisis between 
3 and 4 is complete; stamens 5 and 6 are only half the normal length but have 
complete anthers—chorisis between 5 and 6 is complete. 


This would be represented thus: 
(ia.0.1) 0% DGD) iG eae a) Geel (a sea) 
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Another graphic method and the one which we have adopted is as follows. 
Each flower is represented in a table similar to the following: 


Number of 


Diagram Stamen Filament Chorisis Anther 


Frequency 


The first vertical column gives the frequency of the variation or the number 
of individuals examined with this structure. The second vertical column gives 
the number of the corresponding diagram in the plates. The third vertical 
column gives the individual stamens in the positions already defined while the 
other three columns denote the various factors to be considered. The different 
possibilities of variation in these may be shown by the symbols 1, 4 and 0 as 
already defined. It should be noted, however, that in positions 1 and 2 a dash 
(—) will be placed in the chorisis column to indicate that these are typically 
non-chorised stamens and that absence of chorisis does not therefore indicate 
abnormality. Representing the same example as before, by this method, we 
would have: 


Frequency eee Stamen Filament | Chorisis Anther 
1 1 — 1 
2 s = 1 
3 1 1 il 
4 4 1 1 
5 i 1 1 
6 4 1 1 


The following table shows graphically the types of variations illustrated in 
Figs. I—LVIII, ie. Class 1, Sub-class A, flowers in which the perianth and 
gynecium are both normal. 


It will be seen that in the flowers illustrated in Figs. XLVIII—LVIII another 
complication has crept in. Stamens 8, 4, 5 and 6 have themselves sometimes 
undergone partial or total secondary chorisis. In the tables, therefore, by sub- 
dividing the squares containing the details we can thus adhere to our initial 
nomenclature. Let us take the three most difficult examples to illustrate this. 


(1) Fig. XLVUI. The division corresponding to stamen 8 is sub-divided. This 
would indicate that in this position there were actually two stamens, The nature 
of each of these individual stamens is, as before, given in the sub-divisions. In 
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this example what really occurs is: There are two individual stamens each equal 
to the original length and bearing a complete anther and separated from one 
another by secondary chorisis. 


(2) Fig. L. Divisions 5 and 6 are sub-divided to show that there has been 
secondary chorisis in both of these stamens. In the former, chorisis has been 
complete but has resulted in one being full-length and with a functioning anther 
while the other is only half-length with a functioning anther. In the latter, 
chorisis has not been complete inasmuch as only the anther has been chorised. 


(8) Fig. LIT. Divisions 5 and 6 are both sub-divided, consequently we may 
infer that both of these stamens have undergone some stage of chorisis. In the 
first column we see that all are full-length, in the third that all have functioning 
anthers, but the second tells us that chorisis has been only partial in each case. 
The symbol } between 5 and 6 indicates that between these two chorisis has also 
been partial. Thus we conclude a state of affairs as follows: In position 5.6 
(1) there arises a single filament which divides into two at some distance from 
the base and (2) that each of these again sub-divides and (3) that on the end of 
each of these four sub-filaments there arises a functioning anther. The others 
may be worked out in a similar manner but a reference to the diagrams will at 
once obviate any misrepresentation. 


From the foregoing table and illustrations it is evident that further classi- 
fication is possible but it would be well to point out here certain difficulties 
which arise. As an example let us consider such a case as (using our original 
terminology) that in which, in any of the positions (1; 2; 3.4; or 5.6), the stamens 
are represented thus (1.1.0) (0.0.0), thus ($.1.1)($.1.1) or thus (4.0.1) (4.0.1). 
Which shall have precedence? If we are to consider these variations as deviations 
from the usually accepted normal cruciferous flower, then we may safely assume 
that that flower which has the greatest number of functioning parts in a certain 
position is less aberrant than one in which any or all of the parts are altogether 
wanting; while, on the other hand, if in a position in which chorisis normally takes 
place, we have defective groups like those in cases 2 and 3 cited above, in one of 
which chorisis has taken place but not in the other, we must consider that group 
in which chorisis has occurred as being the one less removed from normal. On 
this basis then the above examples would be placed in the following order with 
regard to normality : 


GaGa yd. bl) (2). G.0.1)G.0. 1); (3) (1. 1.0)(0..0..0). 
Similarly for any of the others. 


Consequently we are now in a position to classify the actual cases under 
observation. 

So far we have considered only those flowers in which there was the typical 
number of stamens, with their manifold variations in size and structure, but now 
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we must classify those in which secondary chorisis has given rise to more than 


the accepted number. 


~ 


Let us take stamens 5 and 6 as our basis, ie. those individuals in which 


position 5.6 is occupied by more than two stamens. 


The relative frequencies of the different types of variation in the andreecium 
in the 618 specimens so far considered (see Table C) are very interesting. 
The number 1062 in Table F refers to 1062 flowers in which the andreecium was 


TABLE D. 


Number of 
Variations in 


Number of 
Individuals in 


: Sub- 9 Sub- 
Section group Section group 
Group a. 
Whole of the andreecium normal. Fig. I — 1 — 1062 
Group 6. 
Andrecium variously modified ... 
Sub-group a. 21 = 480 
Outer whorl normal (5 and 6 variously modified) — — — = 
Section 1. 
Stamens 3 and 4 normal. Figs. [I—IX 8 — 419 — 
Section 11. 
Stamen 3 normal, 4 represented thus (4-1-1). Figs. X—XIV 5 _- 25 — 
Section iii. 
Stamens 3 and 4 thus {(1-0-1) (1-0-1)}. Figs. XV—XVIII ... 4 — 9 — 
Section iv. 
Stamens 3 and 4 thus {($-1-1) ($-1-1)}. Fig. XXII 1 — 2 — 
Section v. 
Stamens 3 and 4 replaced by one. Figs. XIX—XXI ... 3 == 25 — 
| Sub-group £. 13 —_ 54 
| Outer whorl represented thus {(1-—-1)(4-—-1)} —- — = = 
| Section i. 
Stamens 3 and 4 normal. Figs. XXJJJ—XXIX 7 _— 42 — 
Section ii. 
Stamens 3 and 4 thus {(1-1-1) (4-1-1)}. Figs. XX X—XXXIV 5 — 11 — 
Section ili. 
Stamens 3 and 4 thus {(§-0-1) (0-0-0)}. Fig. XXXV ... 1 — 1 = 
Sub-group y. 
Stamens 1 and 2 represented thus {($-—-1) ($-—-1)} ... — 4 = 16 
Section 1. 
Stamens 3 and 4 normal, Figs. XXXVI and XXXVII 2 — 11 — 
Section ii. 
Stamens 3 and 4 thus {(1-1-1) (4:1-1)}. Figs. XXX VIII and XX XIX 2 _ 5 = 
Sub-group 6. 
Stamen 1 normal, 2 absent ... — 5 = 29 
Section i. 
Stamens 3 and 4 normal. Figs. XL—XLIII 4 — 27 — 
Section ii. 
Stamen 3 normal, 4 thus (4-1-1). Fig. XLIV ... 1 — 2 — 
Sub-group e. 
Stamen 1 thus (4:—-1), 2 absent. Figs. XLV—XLVII — 3 — 6 
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TABLE E. 
Number of Number of 
Variations in | Individuals in 
Section eee Section oa 
Sub-group 7. 
Stamens 1 and 2 are normal — 9 — 30 
Section i. 
Stamens 3 and 4 are represented by three. Fig. XLVIII 1 = 3 —- 
Section ii. 
Stamens 3 and 4 are normal. Figs. XLIX—LI 3 —- 16 — | 
Section iii. | 
Stamens 3 and 4 represented thus {(1-1-1) ($-1-1)}. Fig. LIT 1 — 2 — | 
Section iv. ; | 
Stamens 3 and 4 thus {(1-0-1) (1-0-1)}. Figs. LITI—LV 3 = 5 — 
Section v. 
Stamens 3 and 4 represented by one. Fig. LVI 1 == d — 
Sub-group 6. : 
Stamens 1 and 2 thus {($-—-1) ($-—-]}}. Fig. LVII —- 1 — 1 
Sub-group x. 
Stamen 1 normal, 2 absent. Fig. LVIII —_ 1 — 2 
TABLE F. 
Frequencies more than 3 in order of magnitude. 
(References have been made to the figures.) 
Figure Frequency} Figure Frequency Figure Frequency 
| 
I 1062 XXVIII 11 XXV a 
VIII 227 XXXVI 9 VII 6 
III 130 VI 8 x 5 
IV 38 XII 8 XLII 5 
XX 21 XIII 8 LI 5 
XL 18 XLIX 8 XVIII 4 
XXII 15 Ix a LVI 4 


normal. 


Where variation occurs, the greatest frequency, namely 227, occurs in 


flowers in which one of the pairs in the inner whorl is replaced by a single 
stamen while the next highest frequency, namely 130, occurs in those flowers in 
which partial chorisis has taken place in the inner whorl of the androecium. 
Following this the magnitude of the frequencies diminishes rapidly. The next, 
namely 38, occurs in flowers in which nearly all the parts of the androecium 
are modified while, near this, is the frequency 21 which exists in flowers having 


only one stamen in each position. 
we find that stamens 1 and 2 are involved. 


In the next two frequencies, namely 18 and 15, 


30—2 
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From these raw data we can see that the inner whorl of the andrceecium is the 
whorl most subject to variation and further that this variation is in the direction 
of a decrease in number. 


TABLE G. 
Class I, Sub-class B: Perianth normal, Gyneciwm abnormal (see Table A). 
saleseey Seales ae me 1S ob ede 
et eS lie valle tes Willers eae Perse) Were eb |) ee tae il ae i) 3 
5) os og 2 ne s a) Ons, lone 2 Z a 
2) 2h |28| 8 | S| S72) 26 (48/8) 5] s 
o| 82 (dela |S ) Sele ee ee Sake eee 
| etal, a Bran sea |e 
| 21) E 
| 1 1 — 1 1 1 — 1 
2 1 — 1 2 1 — i 
‘ - 3 1 1 1 ‘a 3 1 1 1 
¢ Pes 4 1 1 1 ; Ibs 4 1 1 1 
| 5 1 0) 1 5 1 1 1 
6 0) 0) 0 6 4 1 1 
l 1 1 1 1 1 _— 1 
ep tenis Ooa|ot )|s na 
2 1 1 1 3 1 1 1 
; 1 1 1 4 1 1 1 
: SoM. Ona lpeil alee ee i. | lentil 
1 SET 4 i 0 1 2 | LXITI 5 l 1 l 
1 1 1 6 1 1 al 
5 1 Lele 1 1 1 
1 1 1 
@ aa Oak a 


Class II: Perianth abnormal. 


Variations in the members of the perianth (calyx and corolla) have necessitated 
the introduction of new symbols in the diagrams. These are shown in the com- 
posite diagram Plate I, fig. 10, and are explained on p. 257. 


It will be evident from Table H, p. 233, that the same type of variation in the 
andreecium occurs with different types of variation in the perianth, e.g. in the 
second and fifth figures no fewer than six different variations in the perianth 
accompany a single type of variation in the andrcecium. Reference to the 
diagrams in the plates will show what these variations are and will render a 
detailed explanation unnecessary. The asterisk in Table H under LXXVIII 
indicates that there has been adhesion between stamen 1 and one of the 
stamens in position 8.4, in other words between one of the members of the 
outer whorl and one of the members in the inner whorl. 


Class ITI. 
The members of this class are characterised by a reduplication of the various 
organs but without separation into two distinct flowers. There are in all 11 


individuals with 10 different types of variation. A word of explanation is 
necessary with regard to the interpretation of the position of the various stamens 
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Class IT. 
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The stamens in the outer whorl are longer than those in the 
Consequently if these were reduced in length they might easily be 


mistaken for members of the outer whorl. 


in these flowers. 
inner whorl. 


In all cases of difficulty, however, the 


crucial test, the point of origin, was applied and the positions assigned to the 


various members as shown in the diagrams were determined microscopically in 
See Table I, p. 234. 


this manner. 


TABLE I. 


Class IIT. 
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Class IV. 


In this class there are 17 individuals giving six different modes of variation. 
Reduplication has taken place to such an extent as to give rise to two separate 
flowers on one pedicel. Each of the flowers was diminutive in size. Two tables 
are thus necessary for each “ flower,’ A and B: see Table J, p. 235. 


Class V. 


This class has been formed to include two very aberrant flowers showing two 
distinct variations. In both cases part of the flower has been replaced by another 
flower, in one case normal in the other slightly divergent. CIX is one of these 
in which the original flower is normal except that one of the carpels has been 
replaced by a small flower (see Table K and diagram, Plate X). CX is the other. 
In the original flower stamen 1 has been chorised and one of the chorised parts 
has given origin to a separate flower (see Table K and diagram, Plate X). 


TABLE K. 
Class V. 
o v o a Ric o os os a a 
2 ££ | od q 5 ~S = 2 &p 2a g qe 3 
S| eS es s s 5 ge | 8s| 3 | 5 
A = Bn = c y TD a es Ss) 
cap elena, Se ee hee Fe 
1 1 me 1 1 1 aa 1 
2 1 = 1 2 1 ae 1 
eC 3 1 i 1 ; CIX 3 1 0 1 
Z 4 1 1 1 B 4 0 0) 0 
5 1 feta ear 5 1 0 i 
6 1 Ik 2p Pak | 6 0) 0 0) 
; 1 OA ahs al 1 1 25 1 
1 (0) * 2 1 = 1 
; en 3 1 1 1 
a a ee 1 B 4 i 1 1 
ay 1 1 1 5 1 1 1 
1 | CX] 8 1 1 1 6 1 1 1 
A 
4 1 1 1 
| 
1 1 1 
1 1 1 
| 
6 1 1 Weed 
| 


The asterisk denotes the position of the origin of the secondary flower. 


This concludes our analysis of variations LIX to CX both as to perianth and 
andreecium, but before proceeding to the statistical part it is desirable that certain 
peculiarities should be observed and that an understanding be arrived at with 
regard to the interpretation of these. 
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In this procedure 44 variations, namely LIX to CII, must be dealt with. 
When we study the number of parts which occur in the position of individual 
members of a whorl and then try to draw conclusions as to normality or abnormality 
of the whorl itself we find the following difficulties. 


Let us take the outer whorl of the andrcecium as an example. 


(1) If in the position normally occupied by stamen 1 there were two stamens 
and in the position normally occupied by stamen 2, no stamen occurred, then with 
regard to the whorl the total number of stamens would be two. Now this is the 
accepted normal number of stamens in the outer whorl, so that if number alone 
were considered the inference would legitimately be drawn from the table that the 
whorl was normal. But this is not so! 


Or (2) If in position number 1, one normal stamen occurred and in position 
number 2, one functioning stamen, with the filament only half the normal length, 
occurred, then the number of functioning stamens in the whorl would be two, Le. 
the accepted normal number. But again, on the basis of number alone, we should 
not be able to say whether the whorl as a whole was normal or abnormal. 


Now as this state of affairs exists not only in the whorl under consideration 
but in all the whorls of the flower, we have thought it not only advisable but 
necessary to emphasise these abnormalities as a safeguard in the interest of 
systematic statistical treatment. 


For this purpose, therefore, small diagrammatic formule have been drawn up, 
and these have been given in conjunction with the diagrams: see Plate I, figs. 
11, 12. 


We have already defined the positions of the various parts of the androecium 
but have hitherto refrained from naming the different constituents of the perianth. 


In the cases under consideration, however, it is necessary to do so, and Plate I, 
fig. 11 illustrates how these are definitely determined. 


The two outer sepals are named A and B (see Fig. 11). A corresponds in 
position to stamens 3.4 and B to stamens 5.6. OC and D are the two inner 
sepals ; C’ corresponds in position to stamen 1 and D to the stamen in position 2. 
The petals are named A’, B’, C’ and D’ and lie respectively between sepals A and C, 
Band D, A and D, and B and C. 


The actual order of all the parts is summed up in Plate I, fig. 12 (1—14). 
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IV. STATISTICAL. 


The analysis which we have given of 1813 flowers is sufficient to show that the 
idea of a definite fixed number of sepals in the calyx, of petals in the corolla, of 
stamens in the androecium or of carpels in the gyneecium of cruciferous plants is 
not upheld by an examination of a large number of flowers of this species. In 
less than 1 per cent. of the flowers examined there was an increase* or decrease 
in the number of sepals in the calyx; in less than 1 per cent. there was also an 
increase or decrease in the number of petals in the corolla, but in 2 per cent. there 
was an increase in the number of stamens in the andreecium, while in 22 per cent. 
there was a decrease in the number. 


Since then the number of sepals, petals and stamens is not absolutely fixed for 
any of the organs it becomes necessary now to consider whether the number of 
members in one organ is related to the number in the others. 


As has already been pointed out we have not only to consider organs as a 
whole, but, in the case of the calyx and the andrcecium, the constituents of these 
organs, owing to the fact that these organs are each divided into two separate 
whorls which are inserted at different levels and are placed in directions at right 
angles to one another. 


Further, a special study has been made of the various positions in androecium 
to ascertain to what extent bilateral symmetry may be regarded as an inherent 
character of the flower under consideration. 


By this means also it seems that some defimite information might be obtained 
with regard to the perplexing and, at present, hypothetical theory of chorisis, the 
reasons for the existence of which have been summarised on p. 219. 


The statistical part has been divided into two sections: 
(1) astudy of the Means and Standard Deviations, and 
(2) a study of the Correlation Coefficients. 


1. Study of the Means and Standard Deviations. — 


Although it is obvious from the analysis of the data under consideration that 
the numbers given for the botanical floral formula, namely, Calyx—4, Corolla—4, 
Stamens—6 and Gynzcium—2, are the nearest integers, it is not at all certain 
from a mere inspection of the tables whether the actual means deviate from this 
number in the direction of excess or deficiency. 


* Where chorisis of a sepal or petal has resulted in two or more distinct individuals we have regarded 
each of these as a distinct sepal or petal in recording the numbers. This method is natural however 
inasmuch as it is the only means by which we may possibly trace reduplication of parts. 
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Consequently the mean and standard deviation for each of the organs and its 
constituents have been calculated and these are given in the following table: 


TABLE M. 
Means and Standard Deviations of the Number of the Organs and their 
Jonstituents. 
es : ee es — ed oo 
; : | Coefficient 
Organ Constituent Member ae Sh cn of | 
| ae Variation 
Calyx ne. fe — -- 3°9796 vaya 67415 
i Outer whorl — 1°9757 "1979 10°020 
5 Inner whorl — 2°0039 1304 6°507 
Corolla... igs — _ 3°9520 *3523 8914 
Andrecium ... — -- | 5°8092 "7567 13025 
Ss Outer whorl — | 199570 | :2704 13°817 
“ = “Stamenl | 9840-1602 «=| «16-280 
Fe —_ —Stamen 2 ‘9713 | ‘1915 19°715 | 
Fs Inner whorl | — | Sane 76588 | 17077 | 
: _ | Stamens 3.4 | 1:9950 ‘2728 | 13674 
me == | Stamens 5.6 | 1°8627 | ‘4863 | 26°107 


(1) The most obvious result which is revealed by these constants is the fact 
that in all cases (except the inner whorl of calyx) the actual mean of the organs 
is less than the recognised typical number, thus : 


The mean number for the calyx is 3979 instead of 4. 

The mean number for the corolla is 3°952 instead of 4, 

The mean number for the andrecium is 5°809 instead of 6. 

(2) The inner whorl of the calyx shows the smallest departure from the 
accepted typical number, namely, 2:004 instead of 2. 


Let us, however, test how far the differences in the character of the analogous 
parts are significant by ascertaining the Probable Error of the difference of the 
means of the characters. 


TABLE N (1). 
I. Constituents of the One 
- 7~— 
| Maan. stars 
Constituent |} Number — Deviation 
Outer whorl | 19757 | +1979 
Inner whorl | 2:0039 "1304 
| 


The difference here is D='02813 and the a eratis error of the difference 


ie = 0037; thus the value p= 76. The difference is therefore clearly 
He D 


m 


significant. 
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It is worthy of notice that the outer whorl of the calyx is more variable than 
the inner whorl and that it possesses on an average fewer sepals. 
TABLE N (2). 
Il. Members of the Outer Whorl of the 


Andrecium. 


Mean Standard 
Number Deviation 


Member 


‘9718 “1915 


| 
| 
| 
Position 1 
| Position 2 


"9840 "1602 | 


The difference here is D= ‘01268 and Ep =:00391; the value —— 
is Di 


Thus when the two positions of the outer whorl of the andrcecium are taken 
into consideration, a probably significant difference is found between the means of 
the distribution of the parts of this whorl. Now in position number 1 there is 
a greater approach to the accepted type owing to the fact that when the com- 
ponent of one of the positions of the outer whorl was found to depart from the 
accepted type, the other position was selected as the starting point for the 
orientation of the flower and was called position number 1. It is all the more 
noteworthy that the deviation for position 2 is not in the direction of greater but 
of lesser frequency and the variability of position 2 is greater. We have thus 
again a reduction in the value of the type with greater variability. 


TABLE N (3). 
III. Members of the Inner Whort of the 


Andrecium. 


Mem Mean Standard 
Cen Number Deviation 


Position 3.4 ... | 1°9950 2728 | 
| Position 5.6 ... | 1°8627 -4863 


Here the ratio eee is nearly 15 and therefore there is quite a significant 


= (m,—™M2) 


difference between the means of the distributions of the two members of the inner 
whorl of the andreecium. In both cases the tendency is towards a suppression of 
functioning stamens rather than an increase, together with greater variability 
in the case where the reduction from the accepted type is more marked. 

This difference in variability is, in the main, real and is not due to the arbitrary 
selection of the 3.4 position. This will be evident from a study of Table XIX. 
It will there be seen that there were 1754 cases where two stamens occurred in 
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one of the positions of the inner whorl of the andreecium. This is the number 
in the accepted type, and thus there is no variability. What is the nature of the 
distribution of the stamens in the other position (Table XVIII)? It is as follows: 


1 | 2 | 3 4 | 


| 
287 | 1436 | 26 | 5 | 1754 


The mean for this array is 18569 stamens, with a variability of ‘4116. Thus 
when there is no variability in one position of the inner whorl of the androecium 
there is a large variability in the other position. 


Similarly we find the following distribution for position 2 in the outer whorl 
of the andreecium when position 1 is of the accepted type, 1.e. shows no variability. 


57 1708 1 1766 


The mean for this array is ‘9683 with a variability of 1784. Again therefore, 
when there is no variability in position 1, there is a reduction of type in position 2 
with great variability. 


2. Study of the Correlation Coefficients. 


For the purposes of this study a number of correlation tables have been 
prepared and as the results of these will have to be considered under different 
groupings it seems advisable to tabulate them, and insert them consecutively. 
The system which has been adopted to facilitate reference is to commence with 
the outer whorl of the calyx and consider all cts relations with the other whorls of 
the flower passing from the outside inwards; following this comes the inner whorl 
of the calyx and its relations with the other constituents of the flower from the 
outside inwards and so on. 


The following table shows the characters studied and the correlation coefficients 
found. 


In order to make the comparison of the various correlations as complete as 
possible it will be necessary to consider each constituent or organ with all the 
other constituents or organs and to avoid overlapping as far as possible. The 
most natural method would be to commence either with the outermost constituent, 
namely, the outer whorl of the calyx, or with the innermost constituent, namely, 
the inner whorl of the andreecium. For reasons of a morphological character, which 
will be seen later, the inner whorl of the andrceecium has been chosen as the 
starting point. : 


Biometrika x 32 
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TABLE O. 


Correlation Coefficients between the Number of Various Organs and their Constituents. 


Table Yr. 
The outer whorl of the calyx and the inner whorl of the calyx... is xa I 1957 
The outer whorl of the calyx and the corolla ae w II "7275 
The outer whorl of the calyx and the outer whorl of the andrcecium ae a Ill “5886 
The outer whorl of the calyx and the inner whorl of the andreecium is ato IV 2613 
The outer whorl of the calyx and the andrecium ... She Fae ee ids Vv “4371 
The inner whorl of the calyx and the corolla Ae via a VI 2476 
The inner whorl of the calyx and the outer whorl of the andrecium ase io VII 3229 
The inner whorl] of the calyx and the inner whorl of the androecium wats moa). WELL +3905 
The inner whorl of the calyx and the andreecium ... oh ee wee Ben IX “4592 
The calyx and the corolla 2 aes ae see noe eA x 6926 
The calyx and the outer whorl of the andreecium ne re es Bsc XI "6245 
The calyx and the inner whorl of the andreecium ... a ute Me, ae XII -4014 
The calyx and the andrecium ... nes ore ae a Boe || 2-SILILIL 5721 
The corolla and the outer whorl of the ‘andrescium ante ame ES ae a XIV 4762 
The corolla and the inner whorl of the andrcecium ... ats Ape are hee XV 1773 
The corolla and the andreecium ae XVI 3174 
The outer whorl of the andreecium and the i inner whorl of the androecium: XVII "1984 


The inner whorl of the andreecium, Posie 3.4 and the inner whorl of the XVIIT| 4305 
andreecium, position 5.6... a 


The outer whorl of the calyx and the inner whorl of the andrecium, position 32 oxen 4646 
The outer whorl of the calyx and the inner whorl of the andreecium, position 5.6 XX 2134 
The inner whorl of the calyx and the inner whorl of the andreecium, position 3.4 | XXI 4634 
The inner whorl of the calyx and the inner whorl of the andreecium, position 5.6 | XXII | 2519 
The corolla and the inner whorl of the andrecium 3.4... ses is ... | XXIIT | +2558 
The corolla and the inner whorl of the androecium 5.6 XXIV | 0903 
The outer whorl of the andrecium and the inner whorl of the andreecium, XXV | -2661 


position 3.4 
| The outer whorl of the ‘andreecium and the inner » whorl of the andreecium, tXXVI +1539 
position 5.6 soc 


(a) The inner whorl of the andrecium. 

From the standpoint of the systematic botanist the most anomalous constituent 
of the cruciferous flower is the inner whorl of the andrcecium, inasmuch as in each 
of the positions where one stamen would naturally be expected, the presence of 
two is regarded as typical. It has been explained in a previous section that 
botanists now usually regard this anomaly as having arisen by collateral chorisis 
from what was originally a single stamen in ancestral forms. For the sake of 
conciseness and in order to avoid unnecessary repetition the following abbreviations 


have been used in Tables P—X. 


O. W. Ca. = Outer whorl of the calyx. 
I. W. Ca. = Inner whorl of the calyx. 


Ca. = Calyx. 

Co. = Corolla. 

O.W. A. = Outer whorl of the andrcecium. 
IW. A. = Inner whorl of the andreecium. 


A. = Andreecium. 


= 
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The following Table, P, gives the correlation coefficients between the I.W. A. 


and the other constituents or organs of the flower in order of position. 


TABLE P. 
I. W. A. and the other Constituents. 


Constituent or Organ Table Correlation 
OMWaiCan... As IV 2613 

Hee Wit Cari ne Wang | "3905 

Ca. site ae XII 4014 

Co. Se oe XV | ‘17738 | 
O. W. A. ... ves XVII 1984 | 


The highest correlation between the inner whorl of the andrcecium and the 
other constituents or organs is that with the calyx; next in order come the inner 
whorl of the calyx, the outer whorl of the calyx, the outer whorl of the andreecium, 
and lastly the corolla. In other words, we should be better able to predict the 
number of stamens in the inner whorl of the andreecium from the number of 
members in the calyx than from the number of members in any other constituent 
or organ. 


(b) Relations between the organs themselves. 


Having thus discussed the inner whorl of the androecium with the other organs 
and constituents it might lead to some useful result if we proceed to determine 
the “organic correlation” existing between the various organs themselves. In 
this connection we have to consider the calyx, the corolla and the andreecium, and 
for this purpose the correlation Tables X, XIII and XVI have been prepared. 
The character which has been selected for this study is the number of members 
in each organ. 


The following Table (Q) shows the results obtained : 
TABLE Q. 


Correlation Coefficients between 


Ca, and Co. ... | "6926 
Ca.and\ A.” ... | “5721 


| 
| Covand As... || “3174 
| ee | 
(1) The calyx and corolla are much more highly correlated to one another thau 
is either of these with the andreecium. In other words, the two protective organs 
of the perianth are more highly correlated to one another than is either protective 
organ with the male reproductive organ. It is further evident that (2) the calyx 
is much more highly correlated to both the corolla and the andreecium than are the 
two last named to one another. From (1)it may be concluded that, on an average, 


32—2 
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an increase or decrease from the accepted typical number, namely four, of petals in 
the corolla is accompanied by an increase or decrease in the number of sepals 
in the calyx; while from (2) an increase or decrease in the number of stamens in 
the andreecium will be accompanied, on an average, by a greater increase or decrease 
in the number of sepals than in the number of petals. 


(c) Relations between the constituents of organs. 


The constituents of (1) the calyx and (2) the andrcecium will now be con- 
sidered. ; 

(1) Calyx. The outer and inner whorls of this organ are inserted at different 
levels and have a decussate arrangement, so that, although the organ as a whole 
is protective in function, the two whorls actually help to enclose the flower at 
right angles to one another. The correlation between these two whorls is an 
extremely low one, namely, 1957 (Table I), in other words, an increase or decrease 
in the number of sepals in either of the whorls of the calyx is associated only in 
a very small degree with an increase or decrease in the number of sepals in the 
other whorl. Or again it may be expressed thus, the two whorls of the calyx 
vary to a great extent independently of one another. This statement should be 
taken in conjunction with that made on p. 244 with regard to their Means and 
Variabilities and should also be borne in mind when the correlation between 
these two constituents and the other parts of the flower are discussed below 


(see Tables R and 8). 


(2) Andreciwm. This organ is also composed of two whorls, an outer and 
an inner inserted at different levels. Its function is of course reproductive. 
The correlation between the two constituents is very low, namely, ‘1984 (see 
Table XVII), and is almost the same as that between the two whorls of the calyx. 
The inner whorl of the andrcecium shows greater variability than the outer whorl 
and tends to vary independently of this latter constituent, just as in the case of 
the two whorls of the calyx. 


Having thus considered the organs per se, let us now compare the correlations 
between each individual constituent or organ and all the other constituents or 


TABLE R. 
(d) Correlation Coefficients between the Outer Whorl 
of the Calyx and 
2nd Component Table 1, 
UI Ne OP IL "1957 
Co ner 11 7275 
O. W. A Ill 5886 
Te Wee A: IV 2613 
A Vv 4371 
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organs. For this purpose it will be necessary to tabulate the results in series and 
consequently it might be well to start with the outermost constituent of the flower, 
namely, the outer whorl of the calyx, and tabulate the correlation coefficients 
passing inwards to the andreecium. The inner whorl of the calyx will next be 
taken in relation to the other constituents and so on. 


From the above table it will be seen that the outer whorl of the calyx is most 
highly correlated with the corolla; it is also highly correlated with the outer whorl 
of the andrcecium but much less so with the inner whorl of the andrcecium. 


TABLE §&. 


(e) Correlation Coefficients between the Inner Whorl 
of the Calyx and 
2nd Component | Table rs 
COs. VI 2476 
O. W. A. VII "3229 
Wie Ae VItl “3905 
TNS IX 4592 


The low correlation between the inner whorl of the calyx and the corolla is due 
to the close adherence of the former to type, that is, there is very small variability. 
TABLE T. 
Correlation Coefficients between the Calyx and 


(f) 


2nd Component Table rs 

Come. x "6926 
O. W. A XI 6245 
IW. A XII 4014 
AS sts XIII “5721 


There is a higher degree of correlation between the two organs of the perianth 
than between the calyx and the andreecium. The high correlation between the 
calyx and the outer whorl of the andreecium is mainly due to the high value 
obtained for the correlation between the outer whorl of the calyx and the outer 
whorl of the andreecium. 


TABLE U. 
(g) Correlation Coefficients between the Corolla and 
2nd Component Table is 
OR Wards XIV "4762 
I. W. A. XV 17738 
Aes XVI 3174 
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The corolla is much more highly correlated with the outer whorl than with 
the inner whorl of the andrcoecium, and the correlation between the corolla and the 
andreecium as a whole is not very great. 


A comparison of Tables T and U shows that there is a much greater correlation 
between the calyx and the andreecium and its two whorls, than between the corolla 
and the same constituents. 


So far we have considered the relationships between the different parts of the 
flower from the outside inwards, but when we examine these relationships, taking 
the inner whorls as our starting point, some new aspects of* the problem become 
manifest and, as these have been of great value in the interpretation of the results, 
it has been considered advisable to tabulate them thus: 


TABLE V. 


(h) Correlation Coefficients between the Inner Whorl 
of the Andrecium and 


2nd Component Table is 
Can ic ase XII “4014 
TW C aremene VIII “3905 
OMWis Cannes IV 2613 
Oo AW tcAln tee XVII 1984 
Co. =: mate XV ‘1773 
TABLE W. 
(2) Correlation Coefficients between the Outer Whorl 
of the Andrecium and 
| 2nd Component | Table i: 
| 
Carex: 586 XI 6245 
[OW Cosme, Ala 5886 
Cones. bee XIV “4762 
[iti Rg Cay ee VII 3229 


A comparison of Tables V and W shows that the correlations between the 
outer whorl of the andrcecium and the other components are higher than for the 
inner whorl of the andrcecium, except in the case of the inner whorl of the 
calyx. 
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TABLE X. 
()) Correlation Coefficients between the 
Andrecium and 
2nd Component Table Ts 
Ob aor oe XIII 5721 
Te WieCare ere IX "4592 
OW Can ce- V 4371 
Co. ... ie XVI 3174 


This table shows that when the androecium is considered as a whole it 1s mast 
highly correlated with the calyx and least correlated with the corolla. 


V. MORPHOLOGICAL SIGNIFICANCE OF THE STATISTICAL RESULTS. 


It is quite clear from the tabulated results that there is a definite departure 
from the usually accepted cruciferous structure in a very large number of the 
flowers of Lepidium Draba which have been examined for this study. This does 
not obtain merely in any one organ or constituent but in all the organs and 
constituents, although not to the same degree in each. 


The statistical results will now be examined from the standpoint of the 
botanist in order (a) to note their morphological or genetic significance and 
(b) in order to see whether these figures throw any light on the evolution of this 
cruciferous plant. 


It is almost axiomatic to state that the “purpose” of a flower is a purely 
reproductive one and that therefore its existence is justified only in so far as it 
serves to reproduce its kind. But not all the parts of a flower are solely repro- 
ductive in function. Each individual consists of two parts, (1) Reproductive, 
(2) Protective. (1) The reproductive organs are the gynecium (2?) and the 
andreecium (¥), while (2) The protective organs (perianth) are the corolla and 
the calyx. 


One of the organs of the perianth, namely the corolla, is still further specialised. 
The calyx consists of four sepals, green in colour, whose sole function is to protect 
the flower when in the bud, and in many cases these are reflexed immediately after 
the flower has opened up, and are of no further importance to it. On the other 
hand the petals though essentially sepal-like in structure, in this as in the great 
majority of flowers, are not green but of some other colour. In the species under 
consideration they are white. Now although the petals are of great importance 
in protecting the reproductive organs while in the bud their utility does not cease 
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when the flower opens but, along with small nectaries at the base of the stamens, 
serve as an attraction for insects whose visits are essential for cross-fertilisation. 


The reproductive organs of what is regarded as the typical cruciferous flower 
consist of (1) the gynaecium which is composed of two carpels and (2) the andreecium 
which is composed of six stamens. The stamens are delicate structures and do not 
hold an isolated position in the flower. When in the bud and immature they are 
subject to external influences, for example, (1) they might be shrivelled up by the 
heat of the sun, (2) they might be blasted by rain or wind or (3) they might be 
attacked by herbivorous insects, so that the protective perianth plays an important 
part in flower economics. Now what does an increase in the number of stamens 
imply? It is obvious that if the number of stamens is increased the total volume 
occupied by the reproductive organs is increased and consequently a tax is put 
upon the protective organs if they are to fulfil their function adequately. If the 
perianth does not respond to this tax from space considerations, the reproductive 
organs stand a small chance of ever fulfilling their function, so that one would 
naturally expect that variation of some kind in the perianth would follow variation 
in the reproductive organs. 


Another important point which must never be lost sight of when interpreting 
the statistical results is the symmetry of the cruciferous flower. The calyx consists 
of two whorls each with two sepals; the corolla of one whorl of four petals and the 
andrceecium of two whorls of stamens, the outer having two members and the inner 
four members (see Plate I, fig. 7). Consequently a cruciferous flower is bilaterally 
symmetrical only on that vertical plane which passes through the division wall of 
the carpels, between each of the pairs of stamens in the inner whorl, between two 
petals on either side and through the middle of the outer pair of sepals, This 
plane may be referred to as the “plane of symmetrical division.” Owing to the 
fact that the corolla consists of only one whorl, the outer whorl of the calyx 
corresponds in position to the inner whorl of the andrcecium, and the inner whorl 
of the calyx to the outer whorl of the andreecium. 


From a study of the Means and Standard Deviations of the various organs and 
constituents we arrive at the following conclusions : 


Calyx. (1) The greatest approach to constancy in number in the whole 
flower is in the inner whorl of the calyx. 


(2) There is a significant difference between the means of the two whorls. 


(3) There is much greater variability in the outer than in the inner whorl of 
the calyx and on an average it possesses fewer sepals. 


(4) There is a tendency towards a reduction from type in the number of 
sepals in the calyx. 


Corolla. (5) There is a tendency towards a reduction from the accepted 
typical number in the number of petals in the corolla. 
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Andrecium. (6) There is a significant difference between the means of the 
distributions of 
(a) the members of the two whorls of the androecium, 
(b) the members of the two positions in the inner whorl, 


and (c) the members of the two positions in the outer whorl. 


(7) From whatever axis we view the andreecium as an organ it is distinctly 
asymmetrical in the distribution of its functioning stamens. 


(8) There is much greater variability in the inner whorl than in the outer 
whorl of the andrceecium. 


(9) In both positions in the inner whorl of the andrcecium there is a tendency 
towards a reduction from the accepted typical number of stamens and in the 
position where this is most marked there is the greatest variability. 


The interpretation of these results is not at first sight very evident. 


Why should there be a tendency towards a reduction in the number of 
members in the different organs of the flower and why should this tendency 
be most marked in the inner whorl of the andreecium? As has already been 
pointed out all the flowers examined were taken from a single plant which gave 
rise to new stems by means of buds on the roots. May this tendency to reduction 
in the parts of the flower whose function is sexual reproduction not be an expression 
of a tendency towards an elimination of sexual in favour of vegetative reproduction ? 
Another phenomenon which lends support to this hypothesis is the fact that in 


this plant the percentage of “ pods” which attain maturity is extremely small. 


Whether there is or is not a tendency towards vegetative reproduction, may 
we not also have here a harking back towards an ancestral form in which the 
number was less than the at present accepted typical number? In fact one would 
expect that if the present constitution of the inner whorl of the andreecium had 
been most recent in development, reversion would first take place in it, and 
conversely one might reasonably conclude that since this whorl shows greatest 
variability, and most marked tendency to reduction in the number of members, it 
is more than probable that its present constitution was arrived at by an increase in 
number from a more primitive type. 


Let us now examine the deductions made from a study of the correlation 
coefficients and see if they have any morphological interpretation. 


(1) The calyx and corolla are more highly correlated with one another than is 
either of these with the andrcecium. 


(2) The calyx is more highly correlated with the andreecium than is the corolla. 
In other words, the two protective parts are more intimately associated in increase 
or decrease with one another than is either of these with the male reproductive 
organs, and further the calyx which is solely protective in function is more 
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intimately correlated with the male reproductive organs than is the corolla which 
serves as an attraction for insects as well as a protective covering of the bud. 


(3) The two whorls of the calyx are not highly correlated, i.e. they vary inde- 
pendently of one another. 


(4) The two whorls of the andrcecium also are not highly correlated. Morpho- 
logically this means that when there are two constituents in one organ, each having 
the same function, they may vary independently of one another, so that although 
an increase or decrease in the number in either may be correlated with an increase 
or decrease in the number in any other constituent of the flower, the same does 
not hold true with regard to the two constituents. 


(5) The outer whorl of the calyx is most highly correlated with the corolla, 
next with the outer whorl of the andreecium and lastly with the inner whorl of 
the andrecium. The reason why the outer whorl of the calyx is more highly 
correlated with the outer whorl than with the inner whorl of the androecium is 
not at first sight very evident, but may be explained on the basis of its protective 
power. The members of the outer whorl of the andrcecium lie in a plane parallel 
to that of the outer whorl of the calyx, and are much more widely separated in 
this plane than are the members of the inner whorl of the andreecium. Con- 
sequently any increase in the number of stamens in the outer whorl would involve 
a much greater increase in volume within the flower than a corresponding increase 
in the number of stamens in the inner whorl. Thus we are not surprised to find 
that such an increase in the outer whorl of the androecium is more intimately 
associated with an increase in the outer whorl of the calyx than a corresponding 
increase in the inner whorl of the andrcecium would be. 


(6) There is very low variability in the inner whorl of the calyx and it is 
almost equally correlated to the two whorls of the andreecium. The morphological 
explanation of these facts follows as a corollary to that given above. 


(7) The calyx is much more highly correlated with the andrcoecium as a whole 
and with its two whorls than is the corolla. 


As we have already said the calyx is the predominantly protective organ and 
consequently this higher correlation has a physical basis. The corolla being partly 
attractive does not enter so closely into space economics. 


(8) The outer whorl of the andrcecium is more highly correlated with the 
other components of the flower than is the inner whorl of the andreecium. This 
again follows on the basis of space considerations. Any increase in the number 
of members in the inner whorl of the androecium does not involve so radical a 
change in the volume of the flower as does a corresponding increase in the outer 
whorl of the andrceecium. 
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VI. VARIATION IN THE GYNACIUM. 


So far we have not considered the gynzcium on account of the small number 
of variations which occur in that organ and from the fact that these do not lend 
themselves to statistical treatment. 


The gynecium consists typically of two carpels which are flattened in a vertical 
plane parallel to those containing the pairs of stamens in the inner whorl of the 
andreecium. The thin partition wall separating the two carpels therefore stands 
at right angles to this plane. 


Now when we examine the different types of variations in the structure and 
number of the carpels we find the following: (1) a single carpel, (2) two carpels 
(typical), (3) three carpels, (4) four carpels, (5) two sets of two carpels within 
a single perianth, (6) two sets of two carpels within separate perianths but on one 


pedicel. 
Let us now proceed to examine each of these in some detail. 
(1) The gyneecium consists of a single carpel (see Figs. LXXXVIIT—XCII). 


In all these cases, except LXX XVII, as will be at once seen by reference to the 
figures, the suppression of a carpel is accompanied by the suppression of some of 
the members of nearly all the other organs thus: 


In LXXXVIII two petals are absent and one stamen is aborted. 
In LXXXIX one sepal, two petals and two stamens are absent. 
In XC, XCI and XCII all the organs are deficient in members. 


A noteworthy phenomenon in this respect also is that the suppression of 
members which accompanies the suppression of a carpel is usually in the vertical 
plane which passes through the plane of separation of the carpels. 


(2) The gynecium consists of two carpels. 


This is the accepted typical structure and the statistical study deals with these 
in detail. 


(3) The gynecium consists of three carpels (see Figs. XCIII and CII). 


When three carpels occur in the gynecium they are never found co-laterally, 
Le. the additional carpel is never found with its origin at the side of a carpel, but 
always arising from the plane of separation, which is in the plane of greatest 
variability. 


(4) The gynecium consists of fowr carpels (see Fig. Cl). 


Just as in the previous case the increase in the number of carpels takes place 
in the plane of separation of the carpels—one on either side, so that a cruciate 
structure is found. A reference to Fig. CI will make this clear. In both of these 
groups it will be evident that an increase in the female reproductive organs is 
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associated not only with an increase in the male reproductive organs but also in 
an increase in the protective organs or perianth. 


(5) The gynecium consists of two sets of carpels within a single perianth. 


This is rather an anomalous group but is extremely interesting inasmuch as it 
contains a series of annectant forms linking group 2 to group 6. What we actually 
have here is a complete reduplication of the reproductive organs encased within 
a single series of protective organs. In some of the flowers examined with this 
structure it was rather difficult to determine the orientation owing to a torsion 
of the thalamus, but in the types figured on Plates IX and X (Fig. XCIV and 
Figs. XCV et seq.) the mode of origin of these is quite evident. Several 
important observations on these forms may be stated. 


(a) There are really two complete sets of reproductive organs and in one case 
see Fig. each of these is of the typical cruciferous structure. 
Fig. XCVII h of th f the typical f truct 


(b) Increase in the number of the reproductive organs is accompanied by an 
increase in the number of members in the protective organs. 


(c) The increase in the number of members of the reproductive organs is for 
the most part in the plane of division of the carpels, in other words, in the outer 
whorl of the calyx and its associated petals. 


(d) This is also the plane along which the separation of the reproductive 
organs has taken place. 


(e) This plane is the one which we have already shown in the statistical part 
to be the plane of greatest variability. 


(6) The gyncecium consists of two sets of two carpels within separate perianths 
but on one pedicel. 


In this group we reach the limit of variability in the material examined. In 
place of a single flower consisting of calyx, corolla, andrcecium and gynecium we 
actually find two complete sets of all these organs, on one pedicel (see Figs. CITI— 
CVIIT), while in one case (Fig. CIII) each of the two flowers has the typical 
cruciferous structure, so that were each of these separately examined it would 
undoubtedly be regarded as a normal flower. Yet we must bear in mind that, 
botanically considered, one flower and one flower only arises from a pedicel. Were 
this, therefore, an isolated example, and if no annectant forms existed, the departure 
might well be regarded as a “mutation,” but a consideration of the numerous 
variations which we have already considered, taken in conjunction with group 5, 
only serves to emphasise the fact that “the vertical plane which passes through 
the partition wall of the two carpels and consequently separates the individuals 
of the pairs of stamens in the inner whorl and passes through the centres of the 
sepals of the outer whorl of the calyx is a plane along which this flower is in a 
state of flux and is the plane in which it is probable that the flower has changed, 
and is still changing, from some quite different ancestral form.” 
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VII. SUGGESTIONS FOR FUTURE STUDIES IN THIS PLANT. 


It must be very obvious to anyone who has perused this paper that the results 
which might be obtained from a study of this plant are by no means exhausted. 
An attempt, however, has been made to interpret the variability in its flowers, 
both from a morphological and an evolutionary standpoint. Studies of a different 
nature might be undertaken in order to test the results obtained, e.g. : 


(1) What is the degree of fertility in the flowers of this plant? For this 
purpose it would be necessary to find the percentage of flowers which produce 
fertile seed. 


(2) What are the variants, if any, which are associated with infertility ? 


(3) What are the characters of the flowers which are produced from the seeds 
of the different variants? If seeds selected from the different variants were grown 
separately and self-fertilised, one could trace the variations in the flowers of the 
next generation and see to what extent the different variations were transmitted. 
This study is capable of much elaboration and is one which would be fraught with 
great possibilities. It seems to involve a satisfactory method of determining how 
far these variations are concerned in plant economics, and also to what extent 
they have been instrumental in the evolution of the Order Crucifere. 


EXPLANATION OF FIGURES 8, 9 AND 10. PLATE I. 


FIGURE 8. 
(a) Typical stamen (outer whorl). 
(b) Stamen with half-length filament and complete anther (outer whorl). 
(c) Typical stamen (inner whorl). 
(d) Non-chorised stamen with two complete anthers (inner whorl). 
(e) Stamen of inner whorl with two complete anthers but only chorised in the upper half. 


FIGURE 9. 


) Aborted stamen of outer whorl, i.e. filament with no anther. 

(b) Absence of stamen in outer whorl. 

(c) Full-length filament in inner whorl with no anther. 

(d) Half-length filament in inner whorl with complete anther. 

(e) Half-length filament in inner whorl with no anther, 

(f) Non-chorised stamen in inner whorl with half-length filament but with two complete anthers. 


FIGURE 10. 
) Normal sepal. 
(b) Sepal divided almost to the very base. 
(c) Sepal completely divided into two distinct sepals. 


(d) Sepal absent. 
(e) Normal petal. 
(f) Petal divided almost to the very base. 


(g) Aborted petal. 
(h) Petal absent. 
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NOCHMALS UBER “THE ELIMINATION OF SPURIOUS 
CORRELATION DUE TO POSITION IN TIME OR SPACE.” 


Von O. ANDERSON, St. Petersburg, RuBland. 


1. Im Aprilheft der Biometrika, hat “Student” gezeigt*, daB das von Cave und 
Hooker vorgeschlagene Verfahren, den Korrelationskoeffizienten zweier oscillieren- 
der Variablen durch Berechnung erster Differenzen (also durch Ersetzung der 
Reihe a, 2,... %,, durch die Reihe: A’a, =x, — x, A’x,= a, — &3,... An. = Bn —4n) 
vom evolutorischen Element zu befreien, eine Verallgemeinerung zuliBt. Das 
Verfahren ist nimlich, streng genommen, nur dann richtig, wenn die evolu- 
torische Komponente durch eine lineare Gleichung darstellbar ist. Findet 
letzteres nicht statt, kann also, z. B., jener nur eine parabolische Gleichung 
hoherer Ordnung geniigen, so mu8 man zweite, dritte u.s.w. Differenzen nehmen 
(also statt A’s,, A’x,... nehme man A”z, = A’a,— A’x,, A” x, = A’x, — A’as, ... etc.) 
und danach Korrelationskoeffizienten berechnen. Letztere kénnen bald einen 
konstanten Grenzwert erreichen, der das gewiinschte Resultat darstellt. 


Unterzeichneter ist schon vor etwa 2 Jahren zu ahnlichen Schliissen gekommen. 
Durch von ihm unabhangige Griinde wurde er aber bis jetzt vom Drucke seiner 
diesbeziiglichen Schrift abgehalten. Da er bei seiner Untersuchung Wege ein- 
schlagt, die von denen des “Students” sehr verschieden sind, und auch zu manchen 
Schliissen kommt, welche letzterem unbekannt geblieben zu sein scheinen, so 
kénnte vielleicht eine kurzgefaBte Darstellung der wichtigsten Resultate seiner 
Untersuchung fiir die Leser der Biometrika von einigem Interesse sein. 


2. Methode. Die englische statistische Schule vernachlassigt in ihren Unter- 
suchungen ein Verfahren, das von russischen und deutschen Gelehrten oft ange- 
wandt wird (Tchebycheff, Markoff, v. Bortkiewicz, u.s.w.) und neben groBer Strenge 
und Exaktheit noch den Vorzug hat recht elementar zu sein—die Methode der 
mathematischen Erwartung némlich, Mathematische Erwartung einer Grofe (4) 
heiBt bekanntlich soviel als das Produkt aus dieser GroBe und ihrer Wahrschein- 
lichkeit (w), also Aw. Wenn eine Variable eine Reihe einander ausschlieBender 


* Biometrika, Vol. x. Part 1, 8. 179, ‘‘ The Elimination of Spurious Correlation due to Position in 
Time or Space.” By ‘‘Student.” 
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GréBen annehmen kann, so ist deren math. Erwartung als die Summe der Erwar- 
tungen aller dieser GréBen definiert. Wir werden hier die mathem. Erwartung 
iiberall durch das Symbol #( ) bezeichnen. (A) ist also, z. B., gleich Aw. 


Die hauptsichlichsten Satze tiber mathematische Erwartungen diirften als 
bekannt angenommen werden. Um aber die Nachpriifung der Formeln dieser 
Schrift zu erleichtern, werden wir hier die fiir uns wichtigsten Satze noch kurz 
andeuten : 


(Ql) H(@t+y-2ztu-t...)=H(a)+HYy)-H(a)+H(u)-H).... 
(2) Wenn uw, y, z,... von einander unabhingig sind, so ist 
H(e.y. 2... =f (a) Ee EZ) sea. 
(3) EH (k«)=kK (x), wo k const. ist; und daher auch : 
iE (hy) = he: 


(4) Wenn eine Variable X die Werte 2, 2,...%, annehmen kann, so ist 
die Wahrscheinlichkeit W, daB die Differenz «;—H(«) zwischen den Grenzen 


—aV HE (a?)—[E(2)P und +a V# (a*?)—[E(a)f enthalten sei, gréBer als 1— = 


ae 
wo a gréBer als 1 sein mu (ein Theorem von Tchebycheff). 


In unserer Untersuchung werden wir iiberall statt des wahrscheinlichsten 
Wertes einer Gréfe deren mathematische Erwartung berechnen. 


Bestimmen wir zuerst, wie sich der Korrelationskoeffizient zweier oscillierender 
Reihen verhalt, wenn man deren GréBen durch Differenzen Ai. (Naa Nace 
A’y, A”y, A’’y, ... ersetzt, und darauf untersuchen wir die Frage von den Grenzen 
der Anwendbarkeit der verallgemeinerten Cave-Hookerschen Methode. Um Raum 
zu sparen, werden wir nur die endgiltigen Resultate der Berechnungen anfiihren, 
ausgenommen die 3 ersten Formeln, deren Bestimmung als Beispiel der Rech- 
nungsmethode dienen mige. 


3. Definition. Unter einer oscillatorischen Reihe werden wir eine solche 
Reihe 


XH, Ua, Uy, eee, Uj, ove, Ln 
verstehen, bei der 


HE (a) = H'@) =... = Ea) =. =f (a) =) const. 
und alle einzelnen Glieder von einander vollig unabhingig sind, so dab 
Ef (#;4;) = E («;). E (a), wobei 7+ 7. 


Solchen Bedingungen wiirde zwm Beispiel eine Reihe geniigen, deren Glieder 
die Resultate emer Versuchsreihe mit konstanter Wahrscheinlichkeit darstellen, 
etwa Resultate von Ziehungen aus einer Urne mit m weiBen und n schwarzen 
Kugeln. 
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4. Mittleres Fehlerquadrat. 
Bezeichnen wir a;—H(#) durch &;, so ist 
KE, = E (a; -— E (a)] = # (a) — EH (2) = 0. 
Da die einzelnen & von einander vollig unabhéingig sind, so ist 
E (&&) = # (&). H(&) = 0. 


Das mittlere Fehlerquadrat der Reihe & ist gleich 


Seine mathematische Erwartung wollen wir (nicht ganz in Ubereinstimmung 
mit der iiblichen Bezeichnung) o,? nennen. 


1 n nu 
eran PO 1 [Se] 28 [Se]. sereatoo 
= E (a*) -(H (@)P, 


ein Ausdruck, der oben im Satze 4 (§ 2) unter dem Zeichen der Quadratwurzel 
steht. 


[St-2@r| {ee| 
Andererseits ist aber 1 (Gone gleich A oe | und daher 
o2= B(E). 


5 nN 
Untersuchen wir den Ausdruck abs (a; —- My], wo M, das arithmetische 
ak 


iL 


Sa 


SAD 


Mittel der Reihe 2, also = bedeutet. Da 


Barr t+ (a) + et... +H (a) + En _ 


nN 


o— M,= HE («)+&- E;— M: 


n 
> &; 

(wenn man JM; fiir +— einsetzt), so haben wir: 
n 


E E (ee My» | ae 1, Be st my | _E E aoe oe] 
acl al 1 


See—2366)| 
=n B(E)— 0B (Me) =n (&) — nF | fant @)- B®) 


ie 
b[ Sq -any |= =H Be) 
if 


Um £(&) zu bekommen, muf man diesen Ausdruck durch (nm — 1) dividieren. 
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Ks ist also auch 


Um das Fehlerquadrat o*y,z fiir die erste Differenz A’xv zu erhalten, bertick- 


sichtigen wir, dab 
A’x; = (#i — Bir) = (Ei — Eis) 


und E (A'a;) = E (&:) — & (Ein) = 9. 
Daher haben wir: 
n= ; } n—-1 
[‘Stwn- Bay] ['S&- si 


aes H | im a eee n—1 


Sa | |S ee _2E Be fin | +E | 2e°]} 
n—l 1 1 2 


= +> (m1) B(E*)— 0+ (n-1) B(E)} = 28 (B). 


Ks ist also Can — on 
Nach demselben Rechenschema ergiebt sich fiir das mittlere Fehlerquadrat 


der zweiten Differenz A’x; der Ausdruck 60,2 


, dritten ms DG aes i 200,7 
» vierten a PN ie = 700,2 
2k! 
» k-ten . Aas, »» AA On: 
Wir konnen daher folgende Gleichung aufstellen 
re One = o'a'a ahs aN Se: CNL ra 7,0, 2 
Oy, = 2 = 6 = 20 = 70 pe are a sre eeeee (1) 
i k ) 
welche exakt ist, und folgende 
n n-1 n—2 n-3 TN n-~k 
3 (0;—M,) “SA'a? “LA's? “SA” ae "Sama SNC 
1 a 1 = u —_— —— u — 1 —— pe pe 
n—1 2(n—1) 6(n—2) 20(m—3) 7O(n—4) “Qk! k ay * 
TEE a 


welche nur anniihernd richtig’ ist. 


n-k n-k 
D> Atk) x2 D2) (A®e -M (h Ve 
5 5 1 J 1 A‘ ‘a, 
* Ks ist vorteilhafter yeaa und nicht —— zu berechnen. 
i :) raga ee 


pea 


—a 
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LHYs 
Ss . 1 
5. Das mittlere Produkt mo 
Wenn zwei Reihen 
X, Xo, X3, shaiie, Xn» 
und Yuva Gas Uae Uns 


beide im Sinne des § 3 oscillatorisch sind, und eine Korrelation nur zwischen 
GréBen mit gleichen Indexen, also zwischen 2, und y;, « und y, 2; und ¥;, u.s.w. 
bestehen kann, so ist es leicht ersichtlich, daB 


E ((x— E@)\[yj— Ey}, =0, wenn i4j, 
Bezeichnen wir y;— H'(y) durch y;, #;— E(w) wieder durch &;, so finden wir 


leicht folgende Ausdriicke : 
Ei 
Pay =H | — =H (Ei), 


n 


| = E (Eni) = pay, 


n—l 
> A’a, A’ Yi 
1 
Pareny= 8 |—Gaa— | = BP 


[E@%— Me) yi- My) 
E | , 


ree a ee ee rs 


("S'a 8) op, A) yy 2k! 
Paw, Agee Nae hee 


Wir konnen daher wieder zwei Gleichungssysteme : 


_Pyraty Parcary Pavrra’y  Pavve atrry Pawa ay 
k! k! 
und 


nN 


n n—-1 n—-2 -3 
> [w—E(«)|[y-H(y)]) DA'aA’y, TAwA"y, & AGA’; 
1 1 1 


n—1 "7S © M2)" = 20 = 3) 
"S Awa, Ativ)y, AM x Ay, 
= t= ———— =p (2a) 
TO(n—4) 7" 2k! eS 
Et kl yi i” —k) 


aufstellen, von denen das erste exakt und das zweite angenihert ist, und welche 
den Gleichungen fiir o,? (also auch fiir 0,2) des § 4 genau analog sind. 
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6. Das Fehlerquadrat der Fehlerquadrate. 

Betrachten wir jetzt den Bereich der Schwankungen der GréSen der Systeme 
(1a) und (2a) um deren mathematisch zu erwartenden GréBen in (1) und(2). Mit 
anderen Worten, gehen wir (mit Riicksicht auf Satz 4 § 2) zur Darstellung der 
math. Erwartung der Fehlerquadrate der genannten Groé8en iiber. 


S [v; — £ (x)P 4 2) ]2 
Fiir 2 7 ergiebt sich das Fehlerquadrat BSD) a (é )] P 
3 [2:— Ma} 
Fiir * : ergiebt sich das Fehlerquadrat Ee ACO ap aes] : 
al n n(n—1) 
S [A’ajp 
Fir DED ergiebt sich das Feblerquadrat 
(2n — 8) (EB (&) - [A (E)P}+2(a— DBE)? 
2(n—1) , 
"S [Av aiP 
Fiir aCe 2) ergiebt sich das Fehlerquadrat 
{P= 
(9n — 23) (H (E+) —[# (&)P} + '7n — 42) LE (&)P 
9(n —2)? , 
n—-k 


S(A® x; 


: apeal 
Und endlich fiir Ohi 


aan (n—k) 
Tere ae Oe 24) HE) — (BEN) + 4 EYP LAS (2+ 1 
+ A? (n—2kh4+2)+4+ A)? (n—2k4+3)4+...+ 472 (n—2k4+h)] 
+ 2B¢ (2 (8) —[H(E)K +8 [BOR Bat Bet + Bea) 


ergiebt sich der recht komplizierte Ausdruck : 


Wenn Jy, b,, bs, ... by die Koeffizienten der Zerlegung des Binoms (1 +1)* 
k(k—1) 


darstellen, also 6,.=1, b,=hk, b= 1g? USW., 80 ist hier 


Qh! 2 
Ag=(b2+b2+b2+... +02" = | eral 


ht 7 
Aj? = (doby + bb, + babs + 0. + Dp adn? = Fsivese ; 


Qk! 2 
A? = (by bz + bib; +... + dy ob)? = Feeeerd 


Smet w ee ee mere e eer e ee eaee eee essere eeeereeeerereseseeeeeeeeeseerese 


' 2k! 2 
A? = (bb; ar bibj45 ap ooo ar by—jbx)? = Feel > 


Cr ee i ee 


O. ANDERSON 275 


(Oy) (Oe Or) + (by Oy + 6") Fe. + (Oy + be + bP +... + Be 1)%, 
B? = (by bi)? + (bob: + br be)? + (do bi + 1 02 + 6565)? +... 
ar (by b, ar b,b, + bibs <font Op-20R-a) 5 
Be = (b,b:)* + (bob; + Bibi gs)? + (Bob; + D1 Di4a + Dabiye)? +... 
hy (by b; a b, bein + babi+2 tee + Dee Ueen)e 


ee cy 


By, a (bo by). 


Wenn die Verteilung der « (und dies ist der fiir uns interessanteste Fall) 
“normal” ist, so kénnen obige Formelnu betrachtlich vereinfacht werden. Da man 


in diesem Fall # (&) gleich 3 [H (&) oder 30,* setzen kann, so haben wir: 


3 (a — E («)P 
1 


4 
Fir << das Fehlerquadrat 2 (vergl. Biometrika, U1. p. 276). 
nv ; 
> («; — M,) Qa! 
” ial ee 1 ) > he 1 . 
n- 
var Cle Dae oder angenéahert eee 
»” “2(n—1). ” ” (n = 1) ge ’ahner rae 1 ‘e 
SN 
ae): (85n — 88) 04 do! 
> 6(n—1) P i 9 (n— 2) ‘ “4 n—-2° 
n—-3 
=f wy, .\2 
PS eo) (231n — 843) oy! Sox! 
e201 = 2) a 7 50 (n — 3)? 4 i, n—3° 
n—k 
> (A® x; 
Fiir aa endlich kann man das Fehlerquadrat in solcher Form dar- 
ma —*) 
stellen : 


@ oo Nin ~ k) + 2(n—b—1) (Epi) +2@-k-2) C Ses ») 


+2(n—k—3)( k. (ke —1).(k— 2) ee 


(k +1). (E+ 2). (E +3) 

alle OSCE 

(h+1). he +2). (6 +3)... 4g)) 
k.(b—1).(k—2)...2.1 yt 

(k +1). +2). (6 +3)... (+h) 5° 


+2(n-—k—jJ) ( 


+2(n—b—b) ( 
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Ks ist also klar, daB zusammen mit dem endlichen Differenzieren die Unsicher- 
heit der Bestimmung von a,” stetig wichst, anfangs etwa im Verhaltnis 


Nee NTR De ISY Ae 
7. Das Fehlerquadrat des mittleren Produktes. 
s Ei 2 7 2 
urs == orpicbe sich das imonleriudine eerie ete 
S (= Ma) (ys — My) 
Fiir + ear ergiebt sich das Fehlerquadrat 
B (Epi) — (BE P| Eis) P+ on2o,/ 
n n(n —1) 
n—-1 
> A'a;A’y; 
Ace aa . Pee Pee f 
Fir SON ea ergiebt sich das Fehlerquadrat 
(2n — 8) {EB (E2r2) — LE (Ea +(e — 1) (LE EW) P+ 02207} 
2(n —1) j 
n-k 
> AMa, Ay; 
Im allgemeinen Fall Soi = ae erhalten wir fiir das Feblerquadrat folgenden 
Sh) 


kik! 
Ausdruck : 


Timp {48 — 28) UE EWE) — LE EP 


+ 2 {LE (Eh) P + 0,70,7} [A (mn — 2h +1) 4+ A? (n-— 2k 4 2)4+ ... + Ae (n— 2k +k)] 
+ 2B? (E (E2yr2) — LE (Ei) P} + 4 (LE (Esti) P + oe2o,?} (BP + Be +... + Bi} , 
wo die Koeffizienten A,’, A,’,... B,?, By, ... dieselbe Bedeutung haben, wie in § 6. 

Wenn 2; und y; einander vollkommen gleich sind, so ist 

[E (Emi) P={H(P)Paorts HEM A)=H EA); Foy = a7", 
und obiger Ausdruck fallt mit dem in § 6 zusammen. 

Fiir den Fall der “normalen” Verteilung kénnen wir auch alle diese Ausdriicke 
betrachtlich vereinfachen, besonders wenn wir # (&;;) und #(&?.?) als Funktionen 
von rz, darstellen. Olbne aber hier darauf einzugehen, wollen wir jetzt tiber den 
Korrelationskoeffizienten ins Klare kommen. 

8. Definition des Korrelationskoeffizienten. 

Der Korrelationskoeffizient R wird gewoéhnlich nach der Formel 

2 (a — Ms) (yi— My) 
R= Z berechnet. 


/S@- May 3 i,y 
1 1 
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Zu welchem Ausdruck ist er nun als empirische Anndherung aufzufassen, 


S (a; — Mz) (yi — My) H E (a; — Mz) (yi - m,)| 
1 e 1 


2 


oder zu 


zu = : : 
r/ See —M,) S (y; - M,) Neo tas M,)"] E(S (y:— My] 
1 1 1 1 


Beide Formeln sind durchaus nicht mit einander zu identifizieren und fallen 
nur in erster Anndherung zusammen. Da die zweite aber bedeutend leichter zu 
handhaben ist und dies auch mehr den tiblichen Rechnungsmethoden der englischen 


Schule entspricht, so definteren wir ry, als 


EK E (a; re M;) (Yi -_ | 
1 


ve E E ee Mz) | E E (y;— My} | | 


E : p : 
Anders ausgedriickt ist yy = Pay , WO Dzy, Gz, Fy die Bedeutungen haben, 


Oxy 
welche wir ihnen oben in §§ 4 und 5 beigemessen haben. 


9. Das Verhalten des Korrelationskoeffizienten zweier oscillierender Rethen 
x und y, wenn man deren GroBen durch Differenzen ersetzt. 


Fiir die k-te endliche Differenz von z und y haben wir 


an ee : . Qe! 
y a A! ao, A 4 } — i) 
- ( 1 fe _ Paha, AMy kt. ey Pe 
Afiz, AMy eg n—k nk = NE 2 2 = 9 ye 
SR ae id A TAlkin + T Ah) 2k! 2k! 
E\ > Ae2|.H| & A®y2 Ax ly 2: ee 
paar ae lel lo” EL Ri’ 
So Gee 
Tar, Ay = Guan oy Pay: 


Wir haben also ganz allgemein das genaue Resultat : 
Try = Ya'e, A’y = TA" x, A’y — TA" x, A”'y a T atk), Ally: 


Da aber diese r unbekannt bleiben und wir fiir ein beliebiges Taide, A@y DUT 
n—t . 
AMZ, AMY, 


dessen Anniherungsformel a, a —— kennen, so miissen wir 


wiederum feststellen, inwiefern man sich in der Praxis auf die Ubereinstimmung der 
empirischen Koeffizienten mit deren mathematischen Erwartungen verlassen kann, 
wie gro$ also die Unsicherheit ihrer Bestimmung zu schatzen ist. 

Biometrika x 36 
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10. Mitileres Fehlerquadrat des Korrelationskoeffizienten der endl. Differenzen 
zweier Rechen. 


Aus der Formel &;=—-~ : , kann man folgenden Ausdruck 
a 
vi 5, Aa > A by 
1 1 
ableiten : 
n—k n-k n—-k 
> AMa A® y; > A a2 > Aye 
- Parc k : or Alk : o 
ae = ‘bt i 
IR Ty ay Ac, Ay ni Aa m Aly 
- —— ee, a) eT) aT aa ate p) 
ee Pawea®y 20° Aix 20° Amy 


der nur in erster Annaherung und, wenn alle 4 Briiche des Ausdrucks echte sind, 
richtig ist; dies ist bei groBem n der Fall. Und ferner ergiebt sich daraus die 
Formel : 


+6 (1 =? xy)? ; Pe ee iden: ene k.(k—1) 2 
Gas sisi i Ol eel =e " 2) ey cea 


ig (Pan (er 
+2(n— 22) agen (k +2). Gaal ox 
k.(k—-1).(k—-2).. ; 
+2(n—k— ®) (ary. (K+2).(k+38).. 21 
"3 (Ama) 


7 
ii a = k) 


(vergl. dazu die Formel fiir in § 6). 


Die Formel fiir on ist immer nur dann giltig, wenn man 


ea) ee (Eepe) —[H (Ea? , HE) — [LA (e)P ie E (p*) — [EB Gp?)P 


n LE (Evi)? 4[H (é)] 4 [EB (p*)P 
_ Ege) — Hkh) EE) BG) = BE) BOY) 


_Beye)- E©). st 


2H (&*). Bp’) 
gleich Sal evar setzen darf (vergl. Biometrika, Vol. 1x. p. 4). 


Aus der Formel fiir oR erhalten wir: 
byes (1 = xy)? 
Ry n ; 
jar 1 Lata) 3n a 4 
Ry n-1 °2(n—1)’ 
"i (l=) 7 80% 88 
Rs a ee 718 (m=2)- 
_Gd=—ry)? 231n — 848 
Ryn —3  * 100 (n—8)’ 


u.S.W. 


2 
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Die Fehlerquadrate der Korrelationskoeffizienten aufeinanderfolgender Diffe- 
renzenordnungen verhalten sich folglich zueinander ungefahr wie 2:3:4:5.... 


Die Unsicherheit wachst also mit zunehmender Differenzenordnung etwa im 
Verhaltnis 
N DEEN! O uA) AES Nf Dyions 
ll. Korrelationskoeffizient zweier zusammengesetzter Rethen, die aus oscillato- 
rischen und evolutorischen Elementen bestehen. 


Da “Student” diese Frage treffend dargelegt hat, konnen wir uns kurz fassen. 
Wenn wir in Betracht ziehen, daB fiir uns die evolutorische Komponente einer 
Reihe schon dann in der Praxis verschwunden ist, wenn sie im Verhialtnis zur 
oscillatorischen Komponente so klein geworden ist, daB sie nur die 3, 4", u.s.w. 
Zahlenstellen des Ausdruckes fiir R beeinflussen kann,so kommen wir zum SchluB, 
da8 nicht nur Komponenten, die durch eine Parabel héherer Ordnung darstellbar 
sind, sondern auch solche, denen nur transzendentale Gleichungen (z. B. Sinus- 
reihen) gentigen, beim endlichen Differenzieren eliminiert werden. Ja mehr noch, 
man kann beweisen, daf iiberhaupt alle mehr oder minder “ glatten Reihen,” alle 
bei denen eine geniigende positive Korrelation zwischen den Nachbargliedern 
bemerkbar ist, fiir die Praxis beim endlichen Differenzieren verschwinden. Das 
verallgemeinerte Cave-Hookersche Verfahren ist daher augenscheinlich ein sehr 
universales Mittel, die Korrelation oscillatorischer Elemente aus zusammengesetzten 
Reihen herauszuschiilen. Es hat aber einen Haken, auf den hier noch hingewiesen 
werden muf. 


12. Kann man aus dem Verhalten der Rethe R,, R,, R,, ... Ry bestimmen, ob wir 
den Korrelationskoeffizienten rein oscillatorischer Reihen vor uns haben? “Student” 
scheint zu glauben, daB8 wenn irgendein Rf; seinem Vorgiinger R;, gleich ist, wir 
es sicher mit dem Korrelationskoeffizienten oscillierender Elemente zu tun haben. 
Vor einem solchen Schlu8 ist nachdriicklich zu warnen. Wie es meine (fiir diesen 
Artikel etwas zu langwierigen) Berechnungen zeigen, kénnen zwei Nachbarkoefti- 
zienten h;, R;, auch bei stark evolutorischen Reihen einander ungefahr gleich 
sein, und die Wahrscheinlichkeit eines solchen Zusammentreffens ist gar nicht sehr 
gering einzuschatzen. Nur wenn wir, von irgendeinem #; angefangen, immer 
dieselbe GroBe fiir R erhalten, also Aj = Rji.= Rj4.= Rj4;=, wird ein solcher 
SchluB berechtigt sein, und je linger die Reihe gleicher R, desto wahrscheinlicher 
wird dieser Schlu8. 
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STATISTICAL NOTES ON THE INFLUENCE 
OF EDUCATION IN EGYPT. 


By M. HOSNY, M.A., B.Sc. 


The statistical returns for Egypt are—as compared with European data—still 
in a somewhat elementary stage. Age-distributions are of very little value, and 
in the case of infantile mortality we have only information for certain towns. 
Further, in the larger towns there is a considerable cosmopolitan element, which 
gives them a widely different character from the often sparsely populated rural and 
desert districts. Education is not compulsory, and schools and literacy are largely 
confined to Cairo, Alexandria and the Canal Government, even when we exclude 
all foreign scholars. In the same way criminality* preponderates, in an inverse 
order it is true, in these three districts, but it is not absolutely certain whether this 
is due to their more efficient policing, to the presence of more foreigners, or to a 
real absence of crime in the rural populations. Crime does not appear to arise in 
Egypt from poverty or drunkenness, two of the main factors of its origin in 
Western Europe. The criminal, indeed, is rarely habitual; he is an amateur, 
rather than a professional, and criminals are more often well-to-do, their crimes 
arising from motives of revenge or passion. 


The fact that criminality in Egypt is highly correlated with literacy and 
scholarship would be noteworthy and might possibly be used as an argument 
against education, did not the association of crime and education arise from the 
prevalency of both in the more populated districts, where again we find the 
greatest abundance of foreigners. Naturally such questions arise as: 


(i) Are the foreigners—and if so, which section of them—to any extent 
responsible for the prevalence of crime in the districts frequented by them ? 


(ii) If we allow for urban conditions, will there still be found a high asso- 
ciation of crime and education ? 


It is perfectly easy to obtain from the Egyptian Census-. we used that of 
1907—the number of foreigners of each denomination in the various Egyptian 


* We understand by “criminality” in this paper, not commission of but conviction for crime. 
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governments. The only difficulty here was the presence of British troops in 
Cairo and Alexandria, which placed that nationality in an anomalous position. 
These were estimated approximately and subtracted. The following groups of 
foreigners were then dealt with: (a) Ottomans, (b) British subjects, French, 
Austrians, Germans and Russians*, (c) Greeks, (d) Italians. The Greeks and 
Italians were separated from the general European group (b), because they 
are largely differentiated, the Greeks being frequently small traders and the 
Italians often manual workers. Their large numbers also justified a separate 
classification. 


Table I gives the foreigners per 10,000 in the 17 Egyptian districts we 
were able to deal with. It will be noted that the Greeks far outnumber other 


TABLE I. 


Foreigners per 10,000 and Population per sq. kilometre. 


Europeans other | Population 
Governments Ottomans | than Greeks and Greeks Italians | per sq. 
Italians | | kilometre 
Cairo .... 0 a 453, 312 | 298 204. 6060 
Alexandria oe god 661 514 745 482 6780 
Canalt ... Bas aon 416 583 846 445 7666 
Beherat ... Si hee 43 23 31 11 178 
Charkieh ae ae 29 5 24 | 1 257 
Dakahlieh and Damietta 18 5 | 18 3 346 
Gharbieh 2 0) g 0) 226 | 
Kalliuhieh 8 3 13 2 | 469 
Menufieh 2 1 7 0 | 618 
Assiutt ... + gz 3 il | 454 
Assuan ... he 8 5 13 | 4 | 533 
Beni Suef Aeneas 15 6 9 1 | 351 
Fayoum ... 10 3 4 0 | 255 
Gerga 2 0 2 0 532 
Guizeh .. 6 6 4 33 447 | 
Kenat ... 5 4 4 2 339 
Miniat 10 5 7 1 458 | 


foreigners, but that all foreigners are concentrated in the Cairo, Alexandria and 
Canal governments. 


It was far more difficult to obtain a measure of urban conditions. We had 
to take very rough measures of the density of the population, because the limits 
of certain areas are too vaguely defined to be of any service. El] Arish has been 
excluded from the Canal district, Suez and Sinai have also been excluded as there 
is no enumeration of them with respect to criminality, literacy and scholarship. 
These densities, w.th such value as they have, are given in the last column of 


Table I. 


* The contributions from other smaller nationalities were omitted. 
+ Various approximations and omissions occur in these cases in obtaining density. 
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Table II provides the number of male criminals per 1000 of the male popu- 
lation, the literacy or number of male persons able to read and write per 100 of 


TABLE II. Hducational and Criminal Indices. 


| Male | Titer Male Scholars 
Governments Criminals per : er acy. 5—19, per 100 

| 1000 males | P* 100 males boys of those ages 
Cairo ar er “ae 12°90 28°03 30°20 

| Alexandria ae aA 14°15 | 30°09 19°99 
Canal and E] Arish ... 22°30 23°39 8°54 

| Behera ... ae Sac 5°30 | 9°29 1-01 

| Charkieh aie ae 4°20 9°09 1°66 
Dakahheh and Damietta 4°35 5 {salts} 1°76 

| Gharbieh ; a 5°65 8322, 3°04 

| Kalliuhieh a fs? 6°65 8:13 1°39 

| Menutieh bee ae 3°85 8°45 1:06 
Assiut sate 5°45 701 4:02 
Assuan ... Are well 4°20 7°68 0°82 
Beni Suef Dial 8:42 2°03 
Fayoum ... 6°85 6°54 1°85 
Gerga 3°95 5°84 2°22 
Guizeh A 5:10 6°38 1:15 

|} Kena... Re ie 3°45 5-34 1°66 

| Minia 5:20 (es Sule 


the male population*, aad the number of male scholars aged 5 to 19 per 100 of 
the native boys of those agest. 


We shall use the following symbols to denote the factors which occur in 
Tables I and IT: 
O = Ottomans, G=Greeks, J = Italians, 
= Europeans other than Greeks and Italians. 
C= Criminality, 2 = Literacy, S=Scholarship, D = Density of Population. 
Each government was treated as of equal weight, although the populations 
vary from 233,000 in Assuan to 1,485,000 in Gharbieh. The standard-deviations 
and product-moments were found without grouping. The following results were 
obtained : 


Means Standard Deviations Correlations 
No = TO15, el 4-791, Noh + 8450) aE ‘0468, 
1 — WOO, op = CoAl: Tos = + 6242 + 0999, 
mg = 5031, og = 7'735, rrs = +9028 + 0308, 
Mp = 1528, op = 2475°5, 
Correlations : 


rpc = +9614 + 0124, rps= + °8097 +:0566, = rp, = +9563 + 0138. 
Now at first sight these results would seem to indicate a very bad influence of 
education on crime. Where literacy and scholarship are greatest, there criminals 


* Kgyptian Census, 1907, p. 99. 

+ Foreign male scholars are excluded in the case of Cairo, Alexandria and the Canal. They have 
no sensible numerical existence elsewhere. Criminals and scholars are taken from the Annuaire Statis- 
tique de VEgypte, 1912, pp. 95 and 135. 
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are most numerous! And a superficial argument might be used to condemn the 
character of education in Egypt, or education in general. But it will be clear 
on examination of the isolated values that the observed high correlations arise 
solely from the urban character in Egypt of both criminality and education. We 
have endeavoured therefore to correct this by finding the partial correlations for 
constant density of population. 


There now result 

prog = — 9554 + 0143, 

pcr = — 9231 + 0242, 

blips = +7480 +0721. 
Thus, while there still remains a quite considerable relation between the prevalence 
of literacy and scholars for constant density, we find that for a constant degree of 
urban conditions, the greater the literacy and the greater the amount of education 
the less will be the criminality. The negative correlations are now even higher 
than the uncorrected positive ones and of course are markedly significant. While 
admitting the slender nature of the Egyptian data, we think that this swinging 
over of the relation of crime and education when we correct for density is sug- 
gestive, and it would be of interest to work out similar correlations for states 
in which the statistics are of a more ample character. It does, however, appear 
reasonable to assert that there is no evidence to indicate that education leads 
to criminality—rather the reverse—in Egypt. 


We will next consider the influence of the presence of foreigners in Egypt. 
We find: 


Means Standard Deviations Correlations 
Mo= 99°53, oo — 195-60; Too = +°8425 +0478, 
Mp= 86°88, Bea lleeyidy rao = +°9546 £0145, 
Mg = 119-41, og = 25661, roo= +9429 +:0181, 
m= 68°24, oa, = 152-04, rio= +9192 + 0254. 
Correlations : 
po = +°9575 +0136, rpm = +:9844 + 0050, 
Ppa = +9491 + 0162, rpr= + ‘9617 + 0123. 


Here, if we judged by the raw correlations only, we must assert that the corre- 
lations of crime with the presence of foreigners are so high, that the foreigners 
must be corrupting the Egyptian population. But again the association only 
arises because the criminals and foreigners are both prevalent in the big towns. 
If we correct for density of population, we find the results are very different. Thus 


we have: 
broad >= — ‘9811 +0061, preg=t+ "1692 ap ‘1591, 
p’ac = +°3524 + °1483, pro = — 0718 + 11628. 


It is now obvious that the correlation of Europeans other than Greeks and 
Italians with criminality has become insignificant having regard to its probable 
error; the correlation of the presence of Italians and criminality is now negative, 
but less than its probable error. Thus of Christians only the presence of the 
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Greeks may possibly, but not certainly, be detrimental. The Ottomans have 
now a large negative correlation of a quite significant character, or we might 
assert that the presence of Ottomans tends to diminish criminality. The Greeks 
are frequently moneylenders and alcohol dealers, and the Ottomans, especially 
the Arabs, have among them a good many religious teachers. 


We have, however, to note that criminality is greatest in the Canal Govern- . 
ment, where Europeans and Greeks are most frequent, while the Ottomans are 
most numerous in Alexandria, where crime is almost 40% less than in the Canal 
Government. To test the influence of the three densely populated governments, 
we put the Canal proportion of the Ottomans at Cairo, that of Cairo at Alexandria 
and that of Alexandria at the Canal. There resulted: 

Toc = +°9707, instead of +°8425, 
Tpo = +'9870, instead of + °9575, 

leading to pYoc= + °4918, 

or we may safely say, that if the proportions of Ottomans at Alexandria and along 
the Canal were interchanged, then no relation between the presence of Ottomans 
and the absence of criminality would exist, indeed the relation would probably be 
reversed. The prevalence of the Ottomans in Alexandria has been attributed to its 
more temperate climate. There is certainly a large Ottoman element in Alexandria, 
there being 21,827 Ottomans out of a population of 332,246, and it is larger 
than any other foreign element except the Greeks. In Cairo, with 29,516 out 
of 654,476 inhabitants, the Ottomans exceed any other single foreign element. 
It is conceivable, therefore, that they may be able to influence the moral tone of 
those towns. It must be borne in mind, however, that crime is far more frequent 
in the Cairo and Alexandria governments than in the more purely rural districts, 
and we can scarcely suppose that Cairo and Alexandria would reach the still higher 
criminality level of the Canal, were it not for the presence of the Ottomans. In 
the Canal Government there are exceptional conditions, and we can hardly assume 
that a transfer of the Ottomans from Alexandria to the Canal would interchange 
their proportions of criminality. Greeks no doubt flock to the Canal for business 
purposes, the other Europeans largely for control purposes; the Ottomans, 
relatively speaking, avoid it. Without further analysis it would not be possible 
to assert definitely that the presence of Ottomans reduces crime. It may be 
doubted whether the presence of foreigners, with the possible exception of the 
Greeks, is really associated with the extent of criminality in Egypt. 

A further investigation was undertaken in regard to the possible influence 
of education on infantile mortality. The birthrate and deathrate in Egypt are 
both remarkably high. Thus for the years 1899-1909 inclusive the average 


rates were: 
Births per 1000* Deaths per 1000 Excess of Births over Deaths 


Cairo 40°7 35°7 +50 
Alexandria 38:0 31°8 + 62 


* Still births not included. 
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Many European towns with half the above birthrates have considerably greater 
excesses of births over deaths. 


Unfortunately the infantile mortality is only recorded in Egyptian towns and 
not in the governments at large or in the rural districts. We are obliged, there- 
fore, to deal with these only when considering the relation of education to infantile 
mortality. From p. 49 of the Annuaire Statistique de V Egypte, 1912, we obtain 
the infantile mortality for 1911, and note at once how extraordinarily high it 
stands. From p. 286 of the Census of Egypt, 1907, we take the percentage of 
male literates in the total male population, and from the Statistique Scolaire, 
1912-1913, p. 74, the percentage of scholars in the total population*. As it 
was possible that the density of town population might influence the results, we 
took the number of persons per house, which was about the only social factor 
available. . This will probably represent fairly closely the average size of family. 
This was taken from the Census, 1907, p. 286. The mean, 5°64 persons per 
house, suggests that the average number of living children can hardly exceed 
three. The marked relationship that occurs in European towns between gross size 
of family and infantile mortality cannot be satisfactorily tested on the Egyptian 
data, because we cannot ascertain the infantile mortality in each size of family. 
The number of persons to the house is indeed rather a measure of net than 
gross family, and we only know this as an average value for each town. It does 
not follow that a town with a low number of persons per house is one with 
small gross families; the low number may be due to the heavy infantile mortality 
itself. Accordingly the correlation between persons per house and infantile 
mortality is not necessarily even a measure of the influence of overcrowding on 
infantile mortality (although this is often supposed to be the case); it is con- 
ceivable that a high infantile mortality might be the source of a low number of 
persons per house, and the unravelling of cause and effect is only possible 
where we know not only the number of persons per house, but its relation to 
both the gross and net family of that house. 


Let J = Infantile mortality, Z = Literacy, S = Scholars, P = Persons per house. 
Then we have the following results : 


Means Standard Deviations Correlations 
M, = 29°30, o, = 7608, ry, = —'1040 + 1500, 
M,, = 21:96, 7 ouligue Trg = +5809 + 1278, 
M,= 5°569, og = 2951, ris = +0093 + 1508, 
Mp= 5°643, op = 1428, oe 


Correlations : 
rprp= + 1675 +°1487, rp, =— "842141487, rpg = +0296 4 °1508. 


* The scholars were taken for 1910-1911, the year of infantile mortality, but this involved 
the assumption that the foreign scholars were the same in numbers in 1910-11 and 1912-13, probably 
not a very inaccurate assumption, which in any case affects little more than Cairo, Alexandria, Suez 
and Ismailia practically. It is the number of Egyptian scholars that is rapidly changing and the 
scholars dealt with in our ratio are Egyptian only. 


Biometrika x 37 


286 Statistical Notes on the Influence of Education in Egypt 


TABLE IIL. 


Infantile Mortality. Persons per House and Education. 


Infantile Male Literacy ‘ cholars Persons 

Town Mortality per 100 of per 100 of per 

per 100 births population population House 

Cairo ee. 32°9 28°03 6°12 4°62 
Alexandria Sec 26°9 30°09 3°41 8°43 
Damietta 18°1 7°06 1°89 6°96 
Port Said ae 21:0 24°13 2°64 4°14 
Ismailia ... 26 16:0 28°05 1:22 4:10 
Suez oe See 26°9 25°74 0°64 3°85 
Benha ... sre 29°6 20°54 4°87 5:47 
Zagazig ... we 27°9 25°67 6°47 6:07 
Tantah ... ae 29°6 25°62 9°45 5°18 

| Mansorah sae 21°4 26°77 6°20 4°60 
| Chibine El] Kom 16°1 18°71 5:10 4°92 
Damanhur ee Oieo 19°54 3°55 7:27 

| Guizeh ... a 35°6 19°23 eS 6°12 
| Fayoum ... fee 40°1 15°55 4°78 9°34 
Beni Suef Sur i7/ 2il 21°15 7:19 511 

| Minia... ae 38°2 21°96 8°89 4:96 
Assiut ... ist 33°6 20:92 11°48 6:00 
Sohag... ee 29°0 DOS | 5°58 6°15 
Kena see oe oe 16°76 4°80 5:14 
Assuan ... ae 41°1 21°31 5°78 4:09 

hi 1 


It will be clear from these results that there is no significant relation between the 
literacy of the male population and infantile mortality. There is also no significant 
relation between the number of persons to a house and the number of scholars, 
Le. it does not appear to be the more crowded towns which have the largest 
percentage of scholars to the population ; Alexandria and Damietta, for example, 
have considerably more than the mean number of persons to the house and 
relatively few scholars. On the other hand a larger number of literates marks 
less crowding. Crowding and infantile mortality are slightly related, but con- 
sidering the probable error, not with definite significance *. 


While literacy has no relation to the infantile deathrate, it is noteworthy 
that there is a significant correlation (+°53809 +°1278) between the number of 
scholars and the infantile deathrate, which is greater where there 1s more education. 
Now this either suggests that many scholars mean large families and large 
families correspond to increased infantile mortality, which is usual, or that the 
towns in which there are the classes who educate their children have a higher 
infantile deathrate. The only means, and those inadequate, of testing the first 


* This agrees with the result for overcrowding and infantile mortality in English manufacturing 
towns, where the correlation is very small and sometimes has one sign and sometimes the other. 


M. Hosny 287 


assumption are to take the partial coefficient between scholars and deathrate for 
constant number of persons per house. 


We find*: pris => + 5336 + 1107 5 
similarly pry, = — 0504 + 1543. 


There is thus a slight increase in the relation of scholars to deathrate when 
we take a constant number of persons per house, and it is hard to believe that 
the relation is indirectly due to size of family. The second result shows that 
literacy has no relation to the infantile deathrate. Towns like Alexandria, 
Damietta, Port Said, Ismailia, and to a less extent Suez, with a low infantile 
deathrate have a low education rate, and towns like Cairo, Guizeh, Beni Suef, 
Minia, and Assiut, with high infantile deathrates have high education rates. The 
first towns are on the sea or the canal, the second in the Nile Valley; it is con- 
ceivable that the latter are the more unhealthy for the infant; it would need 
special local knowledge to explain why education has been most accepted above 
Cairot. There does not, however, seem any relation between ignorance, as 
measured by literacy, and a heavy infantile mortality, nor on the other hand can we 
assert that education and European influence have certainly increased criminality. 


* The value of p7,5 is + °0207 +1508, and is therefore not significant. 

+ It is noteworthy that there is no relation between literacy and number of scholars, i.e. education 
of children does not appear to follow the power to read and write in their parents, that is to say 
if we judge by the averages in towns and not by individuals. 
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HEIGHT AND WEIGHT OF SCHOOL CHILDREN 
IN GLASGOW 


By ETHEL M. ELDERTON, Galton Fellow, University of London. 


In 1905-6 an enquiry was made in the Public Schools of the School Board for 
Glasgow as to the height and weight of all the scholars, the occupation of the 
parents, the number of rooms occupied etc. By permission of Sir John Struthers, 
of the Scottish Education Office, these schedules were most kindly placed at the 
disposal of the Galton Laboratory. 


The number of children concerning whom the enquiry was made is over 
seventy thousand of ages 5 to 18 years. The schools from which these children 
came were divided into four groups according to the district in which the schools 
were situated. 


Group A comprised schools in the poorest districts of the city. 


ee a » in poor districts of the city. 
ee. ‘i , in districts of a better class. 
eed 3 » in districts of a still higher class with which are 


included four out of five Higher Grade Schools. 


The data were originally used by the Galton Laboratory with the object of 
discovering how far the physique of school children, judged by their height and 
weight, is affected by the occupation of the father and the employment of the 
mother. With this end in view the necessary data were entered on cards. 
Children over 14 were excluded and all children who had not both parents alive 
were also excluded; this left us with 30,965 girls and 32,811 boys. 


The object of the present paper is to ascertain what is the average weight of 
a child of a given age and a given height. 


The first step in this enquiry was to sort the cards and form tables giving the 
distribution of weight for each height at each age in each school group, and this 
laborious work was carried out very largely by Miss Augusta Jones; this step 
necessitated making 72 tables, and she is responsible for 58 of them while the 
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remaining 14 are due to Miss H. Gertrude Jones; I have to thank most heartily 
these colleagues for their very efficient help in this matter. 


The three factors with which we are concerned are the age, height, and weight 
of school children. The instructions issued to the teachers in the schools for 
recording these three facts were that ages were to be given to the nearest year, 
weights to the nearest pound, and heights to the nearest quarter of an inch*. The 
method of recording ages is very important. Ages being recorded to the nearest 
year, this means that children classed as 6 years were from 5°5 years to 6°5 years ; 
and the average age of this group was 6 years; this is not the method most 
frequently employed for recording ages; “age last birthday” is generally used 
and if “age last birthday” be given as 6 years then the children of that age are 
from 6 to 7 and the average age of children in this group is approximately 6°5 years. 
It will be seen at once that a comparison of weights and heights of two groups of 
children of 6 years cannot be undertaken until we know which method of recording 
ages has been adopted. The height and weight of these Glasgow children have been 
compared by Dr Leslie Mackenzie and Captain Foster+ with the height and weight 
of children as given by the Anthropometric Committee of the British Association 
and it is pointed out that at each age the average weight of the children is 
uniformly below the “standard of the Anthropometric Committee,” and that 
generally speaking the same thing applies to height. As a matter of fact this 
point as to age has not been noticed by these writers and children whose average 
age is 6 years in Glasgow are compared with children whose average age is 
65 years, naturally the younger children are shorter and lighter. There is further 
an important question to be asked: Which standard of the B.A. Anthropometric 
Committee ought to be selected? To this point I return below. 


As I have said the Glasgow children’s ages were recorded to the nearest year}, 
but the Anthropometric Committee recorded age last birthday, and before these 
children can be compared the six months extra growth must be allowed for. This 
is quite easily done by finding the regression of height and weight on age and 
adding half the regression coefficient to the height and weight of the Glasgow 
children. We have found the regression for children of 5 to 14 inclusive to be as 
follows: 


Boys Girls 
Regression of Weight on Age ... 4564 4916 
5 Height on Age foe 1:807 1937 


* Tt is not known what record was made when an exact half year, an exact half pound or an exact 
quarter inch occurred. 

+ Report on the Physical Condition of Children attending the Public Schools of the School Board for 
Glasgow, by Dr W. Leslie Mackenzie and Captain A. Foster. Wyman and Sons, 1907. 

+ The actual wording of the Glasgow direction to school teachers runs: “In recording age, disregard 
months and record to nearest year; thus 6 years 7 months record as 7 years, 8 years 3 months 
record as 8 years.” It is not clear how 6 years 6 months would be recorded; we have assumed as no half 
years are entered in the schedules that an exact calculation was made in the case of each child of 
doubtful age to ascertain whether it was or was not past the half year. 
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This means that we must add 2°28 lbs. to the weight of the Glasgow boys and 
‘90 inches to their height and 2°5 lbs. to the weight of Glasgow girls and ‘97 inches 
to their height before we can compare them with the Anthropometric Committee’s 
standard. The Glasgow children still fall below the “ Anthropometric Committee’s 
Average” but not to the appalling extent shown in the diagram at the end of 
Dr Mackenzie’s and Captain Foster’s Report. Personally I should hesitate to 
compare actual height and weight of Glasgow school children with the so-called 
Anthropometric Committee’s standard. The so-called standard is taken from the 
Final Report of the Anthropometric Committee of the British Association, 1883. 
In Tables XVI—XIX, the average heights and weights at different ages of males 
and females of different classes of the population of Great Britain are given. For 
example in the case of stature we have four classes: Class I, Professional Classes, 
Town and Country, 10,739 individuals, ages 9 to 60; Class II, Commercial Classes, 
Towns, 5472 individuals, ages 8 to 60 (5 below 8 are of no service for means); 
Class III, Labouring Classes, Country, 8727 individuals, ages 3 to 70 (8 below 8 
are of no service); Class 1V, Artizans, Towns, 126,236 individuals, ages 3 to 60, and 
451 babies at birth. All these data are pooled and the column headed “ General 
Population, All Classes, Town and Country,” and it is this “General Population ” 
which is so frequently cited by various medical authorities, including Dr Leslie 
Mackenzie and Captain Foster, as the Anthropometric Committee's “standard.” 
What they understand by such a “standard” it 1s impossible to say. It does not 
represent the “General Population” of Great Britain, but the total population 
measured by the Committee. In this all the babies are artizan babies, there 
are only 8 children from 0 to 2 and these belong to the labouring rural classes, and 
there is no professional class contribution until after 9 years of age. Then the 
various age groups are made up from various social classes in proportions which 
bear no relation whatever to their actual proportions in the kingdom at large. For 
example, the average height of lads of 18 is determined from 1724 of the pro- 
fessional, 62 of the commercial, 148 of the rural labourer, and 371 of the town 
artizan classes! It will be quite clear that a “standard” reached in this way 
means absolutely nothing at all, and yet this is the “standard” which, attached to 
numerous weighing machines is posted in innumerable public places up and down 
this country. It does not in the least represent any “General Population” of 
Great Britain. To be a standard of the general population each class should have 
been properly weighted, and this cannot be done as in certain classes certain ages 
are quite inadequately represented, or not represented at all. There is in fact no 
such thing as an “ Anthropometric Committee’s standard” for either height or 
weight. The only thing that is possible is to compare the corresponding social 
class in that Committee’s measurements with the measurements under considera- 
tion. In the case of Dr Leslie Mackenzie’s and Captain Foster’s data, this is 
undoubtedly the Class IV, “ Artizans, Towns.” Such a comparison is made in the 
accompanying diagrams. It will be seen that the Glasgow children as far as 
height is concerned are the equals if not the superiors of the Anthropometric 
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Committee’s artizan class. In weight they appear to be somewhat less, but here 
Dr Leslie Mackenzie and Captain Foster have overlooked the fact that the Glasgow 
children were weighed without boots, but the British Association Committee weighed 
in ordinary indoor clothing, i.e. with boots or shoes on. Now girls’ boots weigh 
as much as 11 to 21 lbs. and boys’ boots 1? to 34 lbs.* Hence in comparing 
children in Glasgow with those six months older, Dr Leslie Mackenzie and Captain 
Foster have dropped 21 lbs. in weight, while in comparing children without boots with 
those with boots they have dropped another 14 lbs. to possibly 3$ lbs. We should 
anticipate therefore that their readings would be 3} to nearly 6 lbs. too small f. 
There is in our mind very little doubt that the weight of the Glasgow children is 
at every age equal or superior to the weight of the artizan children measured 
by the British Association Anthropometric Committee and the statement of 
Dr Leslie Mackenzie and Captain Foster that “at each age from 5 to 18 the average 
weight of the [Glasgow] children is uniformly below the standard of the Anthro- 
pometrical Committee{” arises from their having entirely overlooked the con- 
ditions as to class, age and manner of weighing which were adopted by that 
Committee, a knowledge of which was essential to any comparison with the Com- 
mittee’s data. In the diagrams on pp. 292-3 we have given the Glasgow measure- 
ments set against those of the artizan class of the Anthropometric Committee, 
and the reader will see clearly how all the arguments based on differences between 
the Glasgow and the “ Anthropometric standard” fall at once to the ground. 
There is nothing exceptional in the Glasgow data, they differ of course from data 
for the children of the professional classes, but this difference is not confined to 
Glasgow. Apart from this point it is essential that the ages of the two groups of 
children should be the same and not differ by six months. 


In the data used for this paper, children of 5 were omitted; they are few in 
number and are not therefore likely to give such reliable results when each age 
group is used separately. The mean weight for each height in inches was then 
found and the regression equation calculated. These equations are given in 
Table I. It will be observed from these equations that, though some irregularities 
occur, generally speaking weight increases more rapidly for a given height in the 
better school groups, at the later ages, and for girls more than boys except at 
ages 6 and 7. ; 

We can see from these equations that the multiple regression surface for 
weight on height and age is not absolutely planar. It can be shown that it is 


* New ‘‘tacket ” boots for girls of five in Glasgow weight 1 lb. 5 oz. falling to about 1 lb. 3 oz. 
when the tackets are worn down ; for girls of fourteen 2 lbs. 6 oz. falling to about 2 lbs. 2 oz. For boys 
of five years new tacket boots weigh 1 lb. 14 oz. falling to about 1 1b. 11 0z. when worn down; for boys 
of fourteen the former weigh 3 lbs. 9 oz. and the latter about 3 lbs. 30z. We have to thank Dr Chalmers, 
M.O.H. for.Glasgow, for this information. 

+ Many public elementary school children have great masses of metal on their boots. Undoubtedly 
the older children have heavier boots, and we can see from the diagrams that the divergence of the 
Glasgow children from the Anthropometric Committee’s artizan children increases with age. 

£ Report, Scottish Education Department, 1907, p. iv. 
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TABLE I. 


Height and Weight of School Children in Glasgow 


Glasgow. 

| Age Group 4, Boys Group B, Boys Group C, Boys | Group D, Boys 
6 | W=—-21:16441:503H | W=—22°57641532H | W=— 24:4894+1591H | W=— 25°6784+ 1-603 
7 | W=—-21-0654+1519H | W= —26:308+1°635H | W=— 27°39041667H | W=— 34:819+1°818H 
8) W=-19°38141:-495H | W=-28:127+1695H |W=—- 6°64041:227H |W=—- 3277741 790H 
| 9| W=—24°65241:635H | W=—32°8264+1-818H4 | W=— 20°67141562H W=-— 36:030+1:883H 
10 | W= —29°589+41-768H | W=—35°696+1:899H | W=— 42°94242°055H | W=—- 51:63642:218H 
| 11 | W=—54:91042302H | W=—39°72542-:005H | W=— 43-628+2-:088H | W=— 67°66442°546H 
| 12) W=—49°5134+2:217H | W= —62:8904+2-476H | W=— 55:°386+42°337H | W=— 62:052+2-450H 
| 13 | W=—65:467+2°547H | W=—-63°51642511H | W=— 81:342+2°854H | W=—- 99:908+43°160H 
14 | W= —83°7844+2°888H | W=—76°749+2775H | W= —103°661+3°251H | W=— 126°446 + 3°633H 

| 

Age Group 4, Girls | Group B, Girls Group C, Girls Group D, Girls 
6 | W=- 14561413298 | W=—-15:98541:345H | W=— 23°72841:551H | W=—- 27°5634+1624H 
7 | W=-19°762+1465H | W= —24:13041:556H | W=— 29°312+1693H | W=— 30°22641694H 
8 | W=—20-7214150383H | W= —30°6224+1-718H | W=— 29°113+1:691H | W=— 40°156 +1:927 H 
9| W=-30°1334+1:730H | W= —29-0814+1°709H | W=— 38°62441917H | W=— 45:0664+2-045H 
10 | W=—36-4784+1:878H | W=—38:8664+1:925H | W=— 46-263+2-088H | W=- 62:015+2397H 
11 | W=—-48-70742:153H | W=- 43:005+2:034H | W=- 51:146+2209H | W=— 57°754+2'330H 
12 |, W= —58°2774+2°360H | W= —63:908+2°465H | W=— -77°316+2°735H | W=— 84:298+42°8597 
13 | W=—%74:15642694H | W=-—88-043+2:939H | W=— 83:960+2:892H | W=—103°594+3:229H 
14 | W=-95:4644+3:084H W=-84:49642:906H | W= —106:1334+3°317H | W= —134:197+3'804H 

; | 


W is weight in lbs., H is height in inches. 
The ages are central ages, and to obtain the weight corresponding 
the child should be taken to the nearest whole year. 


weight on age and not height on age which is non-linear. 


to a given height 


The departure from 


linearity is not great, but Mr H. E. Soper, in order to smooth the material, fitted a 
parabolic surface to the regression surface of weight on height and age. 


Let W be as before the weight in lbs., H =height in inches, and y equal the 


age of the child measured from 10*. 


linear. 


Then 


W=—¢.(y) + ¢o(y) H 


is the form of the surface when the relation of W to H for a given age is sensibly 


Mr Soper now assumed: 


hi(Y= At nyt+t ay, do(y) =b + hy + boy? 


and determined a, a, and as, b), b; and b, so that: 


> {n (bi (y) — a — Hy + aoy”)} = minimum, 
> {n (db. (y) — b, — bry — b.y?)} = minimum, 


where » is the number of individuals in any age group. 


* Thus y takes every value from — 4 to +4 and we have nine equations to deal with. 
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(CLASS B, CLASCOW R 


. Model of Regression Surface, giving mean weight of Girls of Class B of Glasgow Schools for a given 
Height and Age. The mean weight is the vertical coordinate and each section parallel to the front 
of the model gives the mean weight for the several Heights of Girls of a given Age. See p. 295. 
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Now ¢,(y) and ¢,(y) for given ’s are the values determined in Table I for 
the constants at each age of the regression lines 
W=-—-A+8H, 


and n is the number of children dealt with at each age. Our type equations are 


then of the form: 
Up + 4a, % (ny) + 4a.> (ny?) = 42 (A), 


hay’ (ny) + 4c, (ny?) + ae (ny!) = 43 (Ay) 
4a) & (ny?) + 44, (ny?) + ta. (ny) =1> (Ay), 
and similar equations for b,, b,, b., with B for A. 


When these constants had been determined we put Y=10+y and obtain the 
equations given in Table II. 


TABLE II. 
Glasgow. School Children. 


W= Weight in lbs. Hf =WHeight in inches. Y=True age. 
Boys 
Group A  W= {02181 242-214 —-2554Y} x H- (11327524 67°542-14-7417 F}, 
B W= {01533 Y2+1:900—-"1493V' x H— { 83314Y7+ 53:101- 9°8662Y}, 
C W= {03990 Y243-614— 5796 Y} x H - {2:08397 Y2+4 139-456 — 31-6570 F}, 
D W= {02983 Y?+2°799 — 3636} x H— {176624 Y24111-407 —24-0174Y?. 


” 


Girls 
Group A W= {01880 Y?4+1°657—1644V} x H— { :96454¥?+438:119— 9°7305 7}, 
B W= {02081 Y?+1:907 —-2043 Y} x H— {1:16330 V2 + 60-230 — 13°6925 7, 
4 C W= {02315 V242°222 —-2457YV} x H— {121642 VY? + 67626 — 14°3416)", 
D W= {02701 Y?+2:385 — -2832V} x H— {1:56165 VY? + 87-624 — 18°9239 Y}. 


A model of the surface for Glasgow Girls of Class B has been made by 
Mr Soper. Allowing for the points based on few observations at the end of each 
regression line of weight on height for constant age, the scroll represented by black 
threads is quite a good fit to the observations represented by card sections. The 
model is, however, difficult to photograph in a manner which shows effectively the 
approximation of the thread scroll to the cut card sections. The reader should 
note that an additional thread is placed between each of the threads which 
graduate the regression lines for the different ages. 


Further eight tables (Tables a—0) have been constructed in order that the 
average weight of any boy or girl of a given height and age can be read off at once. 
See pp. 8300—303. It has been stated before that the age groups in Glasgow are 
from 5°5 to 6°5 years etc.and that 6,7, 8 etc. are the centres of each age group, 

38—2 


296 Height and Weight of School Children in Glasgow 


but since frequently the centres are at 6°5, 7°5 etc. we have constructed Table III 
which enables anyone to find mean height and weight at any age between 5°5 and 
14°5 years. The regression lines are calculated from the original tables in which 
children of 45 to 5°5 years were included. The regression lines omitting the 


TABLE III. 
Glasgow. 
| Mean Height : Mean Weight 
y Ge es sl er D Ae a8 G D 
Boys: 5:°5— 6°5 41°3 42°1 42°] 43°0 40°9 42°0 42°'5 43°3 
6Ob— 7h 43°0 44°0 44:0 44°8 44°2 45°6 45-9 46°6 
T'5— 8:5 45°1 45:9 46°2 46°9 48-0 49°6 50°1 51°2 
8'h— 95 47°0 47°7 48°1 49°0 52°3 53°9 54°4 56°3 
9:5—10°5 48°8 49°5 49°9 50°9 56°7 58°4 59°5 61:2 
10°5—11°5 50°6 51°1 51°5 52°6 61°6 62°7 63°9 66°3 
11°5—12°5 52°3 52°8 53°5 54°2 66°4 67°8 69°1 70°8 
12°5—13°5 53°8 54°3 55°0 55°9 lier 72°9 75°6 76°9 
IZ*5—14'5 55°2 55'5 572 57°7 75°6 77°3 82°2 83°2 


Regression on Age | 1800 ins. | 1°728 ins. | 1°847 ins. | 1°846 ins. | 4°305 lbs. | 4°395 Ibs. | 4°772 Ibs. | 4°914 lbs. 


Girls : 


5'b— C5 41°0 42°0 41°9 42°7 39°9 40°6 41°3 
65— 75 42°9 43°7 43°7 44°8 43°0 43°9 44°7 
7'5— 8:5 44°6 45°6 45°6 46°4 46°4 47°7 48°1 
S:5— 95 46°6 47°4 47°6 | 48:6 50°5 51°8 52°7 
9'°5—-10°5 48°5 49°2 49°4 50°4 54:7 55'8 56°9 
10°5—11°5 50°3 51°1 51:2 52°2 59°5 60°8 61°9 
11°5—12°5 52°4 53°0 53°3 54°1 65°3 66°8 68°4 
12°5—13°5 54°4 55°2 55°4 56°5 72°4 74:3 761 
IS*5—14'5 55°8 57°71 57°0 58°7 76°8 81°3 83°0 


Regression on Age | 1°914 ins. | 1°859 ins. | 1°903 ins. | 1°943 ins. | 4°551 lbs. | 5-083 Ibs. | 4-944 Ibs. | 5°489 lbs. 


41°8 
45°6 
49°3 
54°3 
58°8 
64°4 
70°5 
78°8 
89-0 


children of 4°5 to 5°5 years were worked out for height on age and weight on age 
for boys in Group A, and were found to be 1°81 instead of 1°80 for height and 
439 instead of 4°31 for weight, but such differences are not great enough to 
matter and the remaining regression coefficients were not calculated with children 
of five years excluded. 


In connection with the tables (1 to 72, pp. 304—3839) it should be noted that in 
transferring the data for boys from the original sheets to cards, ‘75 of an inch was 
included in the inch above; for example, 30°75 inches was entered as 81 inches 
and the centre of the group of 30 inches is 30°125 inches. The data for the girls 


KE. M. ELpERTON 297 


were transferred to cards much later and the simpler method was employed, and 
30:75 was included in the 30 inch group and the centre of this group is 80°375. 


Through the kindness of Dr Priestley, School Medical Officer for Staffordshire, 
we have been able to obtain the regression of Weight on Height for certain 
age groups of boys and girls in that county. Staffordshire is a county of very 
various occupations and contains an agricultural as well as a mining and factory 
population. 


The children measured are “entrants” and “leavers” and a further group of 
children, namely those from 8 to 9 were measured. The “leavers” include 
children of 12 to 14 years, “since in general the only ‘leavers’ at age 12 to 15 
are rural, and the only ‘leavers’ at age 13 to 14 are urban*.” 


The children were of the age stated, 5 and not yet 6, 8 and not yet 9, on 
January 1, 1911, but the actual day of weighing may have been any school day from 
January to December, so that a child entered as 8 may have been only a few days 
short of 10 when it was actually measured, and therefore the mean age of the 
group of children of 8 to 9 will be 9 years. “In the case of the group of leavers, 18 
to 14, no child can have been more than 14, because on attaining that age the 
children are entitled to leave school, and generally do leave. With these the mean 
height and weight in our tables refer to the true mean of the years of the group, 
viz. 13 and a halft.” We shall table to the middle of the group, namely at ages 6, 
9, 138 and 134. 


The children were weighed and measured without shoes, but in ordinary indoor 
clothes. The figures were read to the nearest quarter of an inch and to the nearest 
quarter of a pound. 


Staffordshire Children. 


GIRLS Boys 
Ages | | | 
Regression of | Regression of 
Mean Mean Taio | Mean | Mean Poe 
Height | Weight | eee Height | Weight ane 
| 
| | 
6 419 | 398 | 1-705 49-1 | 41:0 1-741 
9 47°7 51:1 | 2024 48°1 | 53°0 2°120 
13 56°7 (ek | See, ays) | Thos} 2°811 
13} 57°1 81-0 3°360 56°3 | teat, 3°166 
| 


It will be as well to compare these means with those for all Glasgow ; so far we 
have not given them in this paper for all the schools taken together but only for 
each school group. 


* Staffordshire County Council, Annual Report of the School Medical Officer for the Year 1911. 
J. and C. Mort, Ltd., 39, Greengate Street, Stafford, 1912. 
+ Ibid. p. 25. 
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Tlasgow Children. 


GIRLS Boys 
Ages 
Mean Mean Mean Mean 
Height Weight Height Weight 
6 41°7 40°5 41°9 41°8 
9 47°3 51°3 AlT(OT | 53°7 
Le Yar 74°8 54°6 | 73°6 


Girls in Staffordshire are taller at ages 6, 9, and 13 than girls in Glasgow, but 
they are lighter at ages 6 and 9. We might argue from this a lack of physique in 
Staffordshire girls who are absolutely ‘7 Ibs. lighter at age 6 than Glasgow children, 
and relatively to their height even more than this amount. At age 9 the absolute 
difference is less and at age 13 Staffordshire girls are heavier than Glasgow girls 
but they are 14 inches taller, and since the regression of weight on height at 
age 13 for girls is 3°272 lbs. we should expect Staffordshire girls to be 49 lbs. 
heavier than Glasgow girls, but they are not so much. I should hesitate to say 
that the physique of Staffordshire girls is inferior to that of Glasgow girls; the 
difference probably is one of race, but such questions must remain unsolved till we 
have a far wider range of anthropometric data than is available at present for all 
the districts of Great Britain. Boys show the same characteristics to a lesser 
extent; Staffordshire boys are taller at ages 6, 9, and 13, but they are lighter in 
weight; at age 6 they are ‘8 lbs. lighter than Glasgow boys; at age 9 they are ‘7 lbs. 
lighter and at age 13 they are 1:7 lbs. heavier, Again relative to their height 
Staffordshire boys are lighter than Glasgow boys at the three ages for which a 
comparison can be made. 


Comparing boys and girls in Staffordshire we find that girls of 6 and 9 are 
shorter and lighter than boys of the same age, but at 13 and 13% girls are both 
taller and heavier. At 6 and 9 years the regression of weight on height is 
practically the same for both sexes, but at 13 and 13} the regression of weight on 
height is greater for girls than for boys; girls are heavier proportionally to their 
height than boys are. For girls of 13 an additional ich in height should mean 
3°3 lbs. more weight while for boys the additional pounds expected are only 2°8, 
while for girls of 134 we expect 3:4 lbs. increase for every inch of growth and for 
boys 3:2 lbs. increase. A comparison of the regression coefficients with those given 
for Glasgow in Table I will show that the coefficient is higher in Staffordshire for 
children of 6 and boys of 9 than in any of the school groups in Glasgow. The 
regression coefficient found for girls of 9 and 13 in Staffordshire is practically 
identical with that found in Group D in Glasgow, and boys of 13 in Staffordshire 
would seem to be most like boys of Group Cin Glasgow. 
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In a Drapers’ Company Research Memoir * recently published tables are given 
showing the height and weight of boys and girls of 12 to 13 years who were 
members of the Worcestershire public elementary schools. These tables are 
XLIV and LIX and will be found on pp. 100 and 107 of the work cited; we 
have calculated the mean heights and weights and the regression coefficient of 
weight on height as we have done for Glasgow and Staffordshire. The mean age 
of the group of children of 12 to 13 years is 12°5 years, so allowance must be 
made for the six months age difference in comparing with the Glasgow data. We 
have already given the mean heights and weights of Glasgow and Staffordshire 
boys and girls of age 13 so we will calculate what the height and weight of 
Worcestershire children at age 13 would be. An additional year makes a difference 
of roughly 1:9 inches and 49 lbs. in the height and weight of a girl and of 
1°S inches and 4:6 lbs. in the height and weight of a boy. 


GIRLS Boys 
, Regression ; | Regression 
een eee of Weight Ea ioa Gone of Weight 
eigh eig on Height eigh eig On EeiHe 
12°5) Eat Sen ee 55°2 72°9 2°829 54°6 eal 2°800 
emcee estes Fea | 75.8 = BBS | 744 em 
18 Staffordshire 56°7 780+ = 55°8 ToL3 = 
13 Glasgow 55°2 74:8 — 54°6 73°6 —- 


Worcestershire children are taller than Glasgow children but slightly shorter 
than Staffordshire children. They are also rather heavier than Glasgow children 
but not relatively to their height. The height of Worcestershire children of 
12°5 years is the same as the height of Glasgow children of 13 years, but the 
weight of girls is 2 lbs. less and of boys is 14 lbs. less. Worcestershire children 
are lighter than Staffordshire children, but when allowance is made for the 
difference in height the Worcestershire children are not much at a disadvantage ; 
girls are a pound lighter and the weight of boys is practically the same. 


The differences we have found between the Worcestershire, Staffordshire and 
Glasgow children may well be due to differences of local race, and not be the 
results of differential environment or nurture. We should have little hesitation in 
applying the returns for Glasgow children as an approximate standard—say to the 
lb. and inch—for all British children of the artizan classes. 


* «A Statistical Study of Oral Temperatures in School Children with special reference to Parental 
Environment and Class Differences,” by M. H. Williams, Julia Bell and Karl Pearson. Studies in 
National Deterioration, IX. 1914. Dulau and Co., Ltd., 37, Soho Square, W. 

+ This weight appears somewhat exaggerated. It may in part be due to local differences in the 
average ages of ‘leavers.’ 
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TABLE a GLASGOW. BOYS. GROUP A*. 
Weights for Height at each Age. 
Actual Age. 
6 7 8 9 10 11 12 13 14 
Oo gees.5 = =| ea = — -- _ — 
34 | 30°0 | 31-0 — | = — — — — = 
86 Weslo i325 = == — = — — — 
Ae | SERB oe esi ip) — — — — 
oy, 34:4 35:4) |) = 35:8a — | = — —- | — 
38 35:8 36°9 | 37:4 3iE3 ne — — |) = 
39 37°3 38-4 39°0 | 39:0 38-4 | — —_ — — 
jo | 388 | 39°99 | 40° | 406 | 40:2 | 39:3 = — = 
Bt | 40:3) |) 14 498] AD 30 249 es eles = = = 
y2 40°7 | 499. | 43°71 “4ahOs |ex43-08 | BaB:4ye) w49-a = — 
48 | 43°2 | 44-4 | 45:2 | AB 7 4537 9) 45-4 |) 44-7 AO Gum 
44 | 44:6 | 45°99 | 46°83 | 47-49) 47-6" || 47-5 |) 47-0) 46-2 eee 
S| 258) 46s Ay a de Ae eo 494 | 49° | 49:3 | 48-7 | 47-9 
"oo | 46 | 47:6 | 48-90) 49:9: | 50-7 =| 5IS3: | 51-5) || (51-6) || IcSenenoee 
"©.| 4% | 491 | 50-4 |" 61-5 9) 5974 5351 11 653-6" | 538) | 53 :omen aa 
| 48 | 50S | 51:9 | 53-1 | 54: 54-9.) 556 | 56-1) 756 Deamenoee 
49 _ 53-4 | 54°6. | 55°83 |. 56:8 | 57-7 | 58:4 | 59-1 | 59°6 
50 -- 54°9 56°2 57:5 || 58:6 59°7 60°7 61°6 62°5 
51 = = 57°8 | 59-2 | 60:5 | 61:8 | 63:0 || 164-2, Senet 
52 = = 59°38 _| 60-8 | 62:3 |-63°8 | 65°3° || 66:8. j\m68:3 
53 dn a 60:9 | 62:5 | 64:2 | 65:8 | 67°6. | 69:40") sD 
54 | 64:2 | 66:0 | 67:99 | 69°9 | 72:0 | 74-1 
55 | 65:9 67°8 69°9 72°2 74°5 77°0 
| 86 | 67°6 69°7 72-0 745 77:1 79°9 
eae: 69:2 | 71:5: || 74:0 —| 716-7 | 70-7 aalesono 
eta te 2 alee tales 734 | 761 | 79:0 | 82-3 | 85-8 
59 fy aisle 81°3 84°9 88°7 
60 ae a 83:6 | 87-4 | 916 
61 : 90°0 | 94°5 
put = | : ae ae = 97-4 
TABLE 6. GLASGOW. BOYS. GROUP B. 
Weights for Height at each Age. 
Actual Age. 
| asta 7 8 9 10 u 12 13 U 
| 
[ego Sle 7c5 — | — = = = — = — 
34 | 29:0 — | — = = = — — — 
Sh. Nis80-C@ie ola pa ae = = — — = 
cies epi | Soni Wy: as — _ — 
73307 ile sasGe | reso = — — 
SBN) 8572) 13620 36.6 = — — — — — 
39. | "36°8 Nav:8, noes a a = — = = 
40 | 383 | 39°4-}-40:0 | 4071 | 39°8 — — — _ 
41 | 39-9 “| 40:0.) 47a S41) eae — — — — 
42 | -A0-5: |) 42:6) 4) AB *Ses | A387 eo 8 = —_ — 
8 | 43-0 | 44-9 4570 1 45-5 45-7 45-5 — — — 
yb | +446) | 45:8 9 467) 47 -Sie4 76am 47-6 — — — 
a | ae | 465 1 474 4849 40-159) 49-5) 40-7 eoes = = 
| 46) 47:7. | 49:02) 5051) 50297, 51-5) eS ma eee sel MD IEG — 
20) 47 | 49-2 | 50-6) | 51:8 | 52-7 | 253-40 653-0 erode | ede? cal serorad, 
| 48 | 508 | 522 | 53:5 | 54:5 | 55-4 | 560 | 565 | 567 | 568 
49 | 82:4 | 53:8 | 55-2 |) 56:3 | 5763 9) 87s | 68'S | 59:3 ailoore 
50 | 53°9 | 55:4 |.-56°8 -| 581 | 59:29) 60:2 ss6l-1 | Ci-8 mao 
51 — 57-0. | 58'5 | 59:9 || 61-20) (62745 63:4) |) 64-4en Ga 
52 a 58°7 |) 6020 Glave ose 645 | 65:7 | 66:9 | 68-1 
58 — = 61:9 | 63:5 | 651 | 66°6 | 68-1 | 69:5 | 70:9 
54 65°3 67:0 68°7 70-4 72°0 73°7 
55 6771) | G8r9= | 70:8. 7277 || e746 vers 
56 — = — —- 70°9 72:9 75°0 dell 79°3 
57 — = — -— 72°8 75°0 77°3 om 82:1 
58 Tel) 79261), (82520 ee beO 
59 a = | 79°3 | 81:9 | 84:8 | 87°8 
60 — = — | — —- — 84°3 87°3 90°6 
61 | = — 89°9 | 93-4 
62 | — — -- — 92°4 96°2 
63 | — —- 95°0 99°0 


* Throughout weights are given in lbs., 


heights in inches. 
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TABLE y. GLASGOW. BOYS. GROUP 0. 
Weights for Height at each Age. 
Actual Age. 


eee 
6 Ge 8 9 10 ol 12 13 10} 
35 30°5 as = =— = = — a — 
36 SOE al ae = = = = = — 
37 33°7 36-0 — = = — — — — 
38 35:2 37°5 38-6 = = — — — a 
39 36°8 39-0 40°1 = = = = —~ = 
40 38-4 40°5 41°7 41°8 = = = = = 
41 40-0 42-0 43-2 43°5 = == = = = 
42 ALD 43°5 44°7 45‘1 44-7 = = = = 
43 43:1 45-0 46°3 46°7 46°5 = = = = 
th 44:7 46°6 47°8 484 48:3 47°5 ae = = 
45 46:2 48'1 49°3 50:0 501 49°6 48°5 = = 
4G 47'8 49-6 50°8 516 51:9 51°7 50:9 = — 
Wee 49-4 51-1 52°4 53-2 53-7 53°7 53°3 = = 
We 3 51:0 52°6 53-9 54:9 55-5 55°8 55:7 55-4 = 
2) _— 54:1 55-4 56°5 57°3 579 58-2 58-2 = 
"en | 50 _ 55‘6 57:0 58:1 | 59-1 59°9 60°6 51-0 as 
®| 51 ere ae 58°5 59:8 60:9 62:0 63-0 63°8 64:6 
a) 52 tes yes 60:0 | 61:4 | 62:7 | 641 | 65-4 | 66-7 | 67-9 
53 as = 61-6 63:0 64:5 66°1 67'8 69°5 71:2 
54 as = 63'1 64:7 66-4 68-2 70:2 72°3 74°6 
55 = aah == 66°3 68-2 70°3 72°6 75°1 77-9 
56 = os = ae 70-0 72°3 75-0 77-9 81:2 
5Y = a 74:4 77°4 80°8 84°5 
58 =e ae = aa a 76°5 79°8 83°6 87°8 
59 ane ii ese z 82-2 86:4 91:2 
60 a a8 846 89-2 94°5 
61 el rane = at 92+1 97°8 
62 as aes es diss 94:9 | 101°1 
63 = Se a = 97°7 | 104°4 
64 aos ae = = Zs Js = — | 10738 
65 as = z = <e = as = 111-1 
66 ate = A — | 114-4 
67 —* aE yey 
TABLE 6. GLASGOW. BOYS. GROUP D. 
Weights for Height at each Age. 
Actual Age. 
6 oy, 8 9 10 aD 12 13 Lh 
37 31°7 = = me: = ae = = 3 
38 33-4 =a aa we = me = Z te 
89 | 35'1 37°1 = _ = — — — — | 
40 36°7 38°8 39°6 = re: = = — — 
41 38-4 40°5 41:4 aa = 2 = — — 
42 40°1 422 43-2 43°3 =e = = — _ 
43 41°8 43-9 45-0 A Ole _ = — — 
dA 43°5 45°6 46°8 47°1 46°5 = =e — — 
45 | 452 | 47:3 | 486 | 491 | 48-7 = _ — = 
4b 46°9 49°1 50-4 51:0 | 50°8 49°8 = — — 
Ay 48°6 50°8 52-2 53-0 53-0 52-3 50:8 — — 
48 50°3 52-5 54:0 54:9 55'1 54:7 53-5 = a 
19 ae 542 55'8 56:9 57:3 57'1 56:3 54-9 = 
| 40 = 55:9 576 58'8 59°4 59°5 59-0 58-0 = 
Bo | ol a ze 59-4 60°7 61°6 61:9 61°7 61°1 59:9 
Ra D2. — =. 61-2 62°7 63°7 64:3 64:5 64:2 63°5 
| 53 = ae = 64°6 65'8 66°7 67:2 67°3 67:0 
54 66°6 68°0 69°] 69°9 70°4 70°6 
OMe es — SMES OR Me pOg emi. | WG. || o78:5° |) 97422 
56 va oe = a 72°3 73:9 75°4 76°6 Wiel 
5Y s = —_ cy ae 76:3 78°1 79°8 81:3 
58 a _ 78°7 80°8 82-9 84:8 
59 as =e uae = =o 81:2 83°6 86-0 88-4 
60 ets pla ms 86°3 89°1 91:9 
61 os ae ie ans aa = 89-0 92-2 95°5 
62 a = == Lee aes = — 95°3 99-0 
63 we a es rs = = fs 98:4 | 102°6 
64 an = = a ii Sy a == 106-1 
65 aa ae oo as = ze = te 109°7 
66 ba) oes — mY a = i _ 113°3 
67 23 = ct as he an aa = 116°8 
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TABLE « GLASGOW. GIRLS. GROUP 4. 


Weights for Height at 
Actual Age. 


each Age. 


6 2 8 9 10 11 12 13 1h 
33 | 30:0 — — — -- _ — — — 

8) 81:3" | 3ic3 = — — — — — _ 

35 32°7 32-7 3251 = — — — — — 

36 3420) |) adel 33°6 = — — = — — 

Bre W354) 85:5 35°2 34:2 — ~ = == = 

38 | 36°7 SycOresGen 35°9 — — — = — 

89 "| 38:1 3874: Wl 38:2 37°6 36°6 = = — — 

40 | 39°4 | 39:8 39°8 39°3 38°5 = st ae Es 

Ad 40°38 | 41-2 41°3 41°0 | 40:3 39°3 — — — 

By) “42-1 42°7 429 42°7 42°2 | 41:4 = _ — 

Galle Oreo) ececil 44:4 | 44:4 | 44-1 43°5 42-6 = — 

yh 44°38 | 45:5 46-0 46°] 46°0 | 45°6 45-0 = — 

3 | go | 46:2 47-0 W475 47°8 | 47-9 478 | 47-4 46°7 — 
| 46 47°5 48°4 49°1 49°5 49°8 499 49°8 49°4 — 
OU 7 = 49°8 50°6 51-2 51:7 52:0 52°1 52-1 = 
x 48 = 51:2 52:2 52:9 53°6 54:1 54°5 54:8 55:0 
49 = = 53°7 546 | 55-5 56°3 56°9 575 58-0 

50 == — 55-2 56°3 57-4 | 58-4 59:3 | 60-2 61-1 

51 = — = 58-0 59°3 60°5 61°7 62°9 64:1 
52 = — = 59°7 61-2 | 62:6 | 64:1 65°6 | 67:2 
53 — — = 61:4 63°71 64:7 66°5 68°3 70°2 
54 — = = = 65:0 | 66-9 68:9 71-0 73°2 
55 = — = = 66°8 69-0 Tes 1331 76°3 
56 ss = = = 68°7 fad 1377 7674 79°3 
57 = a 73:2 76°1 79°71 82°4 
58 Ss = — a — 75°4 78°5 81°8 85-4 
59 = — — 80°83 | 84°5 | 88-4 
60 — = — = = = SBE IE 91°5 
61 = a = oe 2 a as 89:9 | 94:5 
62 = = on = = a = 92°6 | 97-6 
63 = = = = = = —_ = 100°6 

TABLE ¢ GLASGOW. GIRLS. GROUP B. 
Weights for Height at each Age. 
Actual Age. 

6 fg 8 9 10 il 12 13 Uy 

33 | 272 — = — = = = — = 

3h 28°7 — = -- — _ — — — 

35 30°1 31-0 = — — — — — = 

| 36 31°5 32°5 as — = — — = =: 

3 33:0 | 34:0 34°2 — — — — — — 

38 344 35°5 35°8 35-4 — — — — — 

39 35°8 37-0 S74 eer oree 36-2 = — — — 

40 1 372 38°5 39:0 | 38-9 38-2 = = — — 

| 41 38°7 40:0 | 406 40°7 40°1 38-9 = - 

| 42 40°1 41°5 42°2 APACHE A9: OMe All = =~ = 

UB A Ate 430 | 43:8 44:2 44:0 | 43:3 | 42:0 = — 

Ah 43-0 44:4 45-4 45-9 45°9 45-4 | 44:4 = — 
ous 44-4 45°9 ATO: |, e477 47-9 47-6 | 46°9 — -- 
rots) 146 45°8 47-4 48°6 | 49:4 | 49°38 | 49:8 | 49:4 | 48-5 — 
2 4 47°3 48°9 50°3 51-2 51°8 52-0. 4 \5.51°3 51:3 == 
mr 48 48°7 50°4 51°9 52°9 53°7 542 | 54:3 | 54:0 — 
49 = 51-9 53°5 54°7 55°7 boo 56-7, 56°8 56:6 

50 = = 55°1 56°5 57°6 58°5 59-2 59°6 59-7 

51 = a 56°7 58-2 59°5 60°7 61°6 62°3 | 62-9 
52 = = a 60:0 | 61:5 62°9 | 64:1 65°1 66-0 

58 — = = 61°7 63:4 | 65:0 | 66:5 | 67:9) "edu 

54 ote Ze, = a 65:4 | 967-2 69-0 70°6 72:2 
55 = = = = 673 | 69:4 | -71-4 73°4 754 

56 = os = -- 69°3 TUGeal es Foro 76°2 78°5 

57 = = a — = foul 76°3 78°9 | 81-6 

58 = ae 75°9 78°8 81°7 84-7 

59 a — = = = 78:1 81:2 | 84:5 87-9 

60 = = a 83:7 87-2 91-0 

61 = = = — = — 86°1 90°0 | 94:1 

62 a = = = ae = = 92°8 | 97-2 

63 — = = — — — = = 100-4 


lk 
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TABLE 7» GLASGOW. GIRLS. GROUP C. 
Weight for Height at each Age. 
Actual Age. 


Height. 


Height. 


7 8 9 10 ib 12 18 14 
30°0 — == oes = a — aad ad 
315 = — — — = = = = 
33°1 33°7 = = ra a er aa aps 
34:7 35°3 35°3 — = = = = — 
36°3 37°0 37°0 — — = = = = 
37°9 38°6 38°8 38°3 —_ = = = ee 
39°4 40°2 40°5 40°2 — —_— _ = = 
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TABLE 1. Glasgow. Height and Weight of Boys of Group A. 


Weight of Boys of 5:5—6°5 years 
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TABLE 2. Glasgow. Height and Weight of Boys of Group A. 


Weight of Boys of 6°5—7°‘5 years 
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TABLE 5. Glasgow. Height and Weight of Boys of Group A. 


Weight of Boys of 9°5—10°5 years 
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TABLE 6. Glasgow. Height and Weight of Boys of Group A. 
Weight of Boys of 10°5—11°5 years 
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TABLE 7. Glasgow. Height and Weight of Boys of Group A. 


Weight of Boys of 11:5—12°5 years 
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TABLE 9. Glasgow. Height and Weight of Boys of Group A. 


Weight of Boys of 13°5—14°5 years 
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Glasgow. Height and Weight of Boys of Group B. 


Weight of Boys of 6°5—7‘5 years 
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TABLE 138. Glasgow. Height and Weight of Boys of Group B. 


Weight of Boys of 85—9°5 years 
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TABLE 14 Glasgow. Height and Weight of Boys of Group B. 


Weight of Boys of 9°5—10°5 years 
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TABLE 15. Glasgow. Height and Weight of Boys of Group B. 


Weight of Boys of 10°5—11°5 years 
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TABLE 17. Glasgow. Height and Weight of Boys of Group B. 
Weight of Boys of 12°5—135 years 
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TABLE 18. Glasgow. Height and Weight of Boys of Group B. 


Weight of Boys of 13°5—14°5 years 
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TABLE 19. Glasgow. Height and Weight of Boys of Group C. 


Weight of Boys of 5°5—6:5 years 
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TABLE 20. Glasgow. Height and Weight of Boys 


Weight of Boys of 6°5—7‘5 years 
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TABLE 21. Glasgow. Height and Weight of Boys of Group C. 
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TABLE 23. Glasgow. Height and Weight of Boys of Group C. 


Weight of Boys of 9°5—10°5 years 
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TABLE 25. Glasgow. Height and Weight of Boys of Group C. 
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TABLE 26. Glasgow. Height and Weight of Boys of Group C. 
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TABLE 29. Glasgow. Height and Weight of Boys of Group D. 


| Weight of Boys of 65—7‘5 years 
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TABLE 30. Glasgow. Height and Weight of Boys of Group D. 


Weight of Boys of 7°5—85 years 
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TABLE 31. Glasgow. Height and Weight of Boys of Group D. 


Weight of Boys of 85—9°5 years 
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TABLE 32. Glasgow. Height and Weight of Boys of Group D. 


Weight of Boys of 9°5—10°5 years 


Height [inca ah fea eee ease eae | Totals 
: - ey ey > 9  \ yD 
3 2 

36 3 1 
oe 

3 ” 1 
43 5, 2 
44 55 8 
45 ” 2 8 
46 ,, 6 5 1 23 
Ty 8 8 Ale 3 48 
48 ,, 4 12 Bia eal 5D 
49 ,, 5 6% 20 15 94 
50 ,, 2 22 26 115 
61, | —— — — — 1 917 84 
52 ” AO OMA SeG 453) — 60 
Dy 3 40 
54 ” 1 12 
55 ,, 2 
56 ” 3 
Siew. 1 
58 ,, 

59 ” 

BO 

Cn. 1 


Totals} 4 8 10 24 27 34 68 50 6: 52 45 25 ¢ 558 


320 Height and Weight of School Children in Glasgow 


TABLE 33. Glasgow. Height and Weight of Boys of Group D. 


Weight of Boys of 10°5—11°5 years 
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TABLE 34. Glasgow. Height and Weight of Boys of Group D. 


Weight of Boys of 11°5—12°5 years 
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TABLE 35. Glasgow. Height and Weight of Boys of Group D. 
Weight of Boys of 12°5—13°5 years 
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TABLE 37. Glasgow. Height and Weight of Girls of Group A. 


Weight of Girls of 5°5—6°5 years 
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TABLE 38. Glasgow. Height and Weight of Girls of Group A. 
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TABLE 39. Glasgow. Height and Weight of Girls of Group «A. 


Weight of Girls of 7°5—8°5 years 
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TABLE 40. Glasgow. Height and Weight of Girls of Group A. 
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TABLE 41. Glasgow. 


Height and Weight of School Children in Glasgow 


Height and Weight of Girls of Group A. 
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Height und Weight of Girls 


Weight of Girls of 10°5—11°5 years 
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TABLE 43. Glasgow. Height and Weight of Girls of Group A. 
Weight of Girls of 11°5—12°5 years 
Height Totals 
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TABLE 44 Glasgow. Height and Weight 
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TABLE 45. Glasgow. Height and Weight of Girls of Group A. 


Weight of Girls of 13°5—14°5 years 
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TABLE 47. Glasgow. Height und Weight of Girls of Group B. 
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TABLE 48. (Glasgow. Height and Weight of Girls of Group B. 
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TABLE 49. Glasgow. Height and Weight of Girls of Group B. 
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TABLE 51. Glasgow. Height and Weight of Girls of Group B. 
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TABLE 53. Glasgow. Height and Weight of Girls of Group B. 
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TABLE 55. Glasgow. Height and Weight of Girls of Group C. 
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TABLE 56. Glasgow. Height and Weight of Girls of Group C. 
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TABLE 57. Glasgow. Height and Weight of Girls of Group C. 
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TABLE 59. Glasgow. Height and Weight of Girls of Group C. 


Weight of Girls of 9°5—10°5 years 
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Glasgow. Height and Weight of Girls 
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TABLE 61. Glasgow. Height and Weight of Girls of Group C. 


Weight of Girls of 11°5—12°5 years 
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TABLE 63. Glasgow. Height and Weight of Girls of Group C. 


Weight of Girls of 13°5—14°5 years 
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TABLE 64. Glasgow Height and Weight of Girls of Group D. 
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Glasgow. Height 


Height and Weight of School Children in Glasgow 


and Weight of Girls of Group D. 
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TABLE 67. Glasgow. Height and Weight of Girls of Group D. 


Weight of Girls of 8°5—9°5 years 
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TABLE 68. Glasgow. Height and Weight of Girls of Group D. 
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TABLE 69. Glasgow. Height and Weight of Girls of Group D. 
Weight of Girls of 10°5—11°5 years 
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TABLE 71. Glasgow. Height and Weight of Girls of Group D. 


Weight of Girls of 12°5—13°5 years 
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NUMERICAL ILLUSTRATIONS OF THE VARIATE 
DIFFERENCE CORRELATION METHOD. 


By BEATRICE M. CAVE anp KARL PEARSON, F.RBS. 


In 1904 Miss F. E. Cave in a memoir on the correlation of barometric heights, 
published in the R. S. Proc. Vol. Lxxiv. pp. 407 et seq., endeavoured to get rid of 
seasonal change by correlating first differences of daily readings at two stations. 
A similar method was used by Mr R. H. Hooker in a paper published some time 
later in the Journal of the Royal Statistical Society, Vol. LXVII. pp. 396 et seq., 
1905. This method was generalised by “Student” in the last number of 
Biometrika (Vol. X. pp. 179, 180). He showed that if there were two variates 
x and y, such that 

x= (t)t+ X, 
y=fO)+ ¥, 
where X and Y are the parts of # and y independent of the time ¢, then the 
spurious correlation arising from a and y being both functions of the time could 
be got rid of by correlating the ditferences of # and y, and that ultimately, when 
m is sufficiently large : 
Tamg Amy = VamHyamty = etc. = 1yy, 


so that the correlation of 2 and y, free from the spurious time (or it might be 
position) correlation, Le. ryy, could be found by correlating the successive differ- 
ences of « and y. When the correlations of the differences remain steady for 
several successive values, then we may reasonably suppose that we have reached 
the correlation 7yy*. 

This method is still further developed by Dr Anderson of Petrograd, who in 
a valuable memoir published in this Jowrnal has provided the probable errors of 
the successive ditference correlations of a system of variables : 


xX; AGS O05) XG 
Vis Vay OO) NG. 


* Having been in communication with ‘‘ Student,” while he was writing his paper, I know that the 
interpretation put by Dr Anderson (Biometrika, Vol. x. p. 279) on ‘‘Student’s” words (Ibid. p. 180) is 
incorrect. ‘‘Student”? had in mind, if he did not clearly express it, the ultimate steadiness of 


. . c 7 
T amy Amy for a succession of values of m. K. P. 
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where the correlations of random pairs of values of the variates, or the product sums, 
S{(X,— X)(X,—-X)}, 


SHCis= OrOe = 9a)e 
are both zero. 


Dr Anderson has further provided us with the values of the standard deviations 
of the successive differences, Le. 


aud ¢ 


Tamx Amy? 


which represent the ultimate values of o,,,, and o Amy? when we have carried m so 
far that the time effect has been eliminated. 


The new method appears to be one of very great importance, and like many 
new methods it has been developed in a co-operative manner, which is a good 
reason for not entitling it by the name of any single contributor. We prefer to 
term it the Variate Difference Correlation Method. 


With the exception of a few illustrations given by “Student,” no numerical 
work on the correlation of the higher differences has yet been attempted. It is 
clear that much numerical work will have to be undertaken before we can feel 
complete confidence in our knowledge of the range and of the limitations of the 
new method. We have yet to ascertain how far in different types of material 
a real stability of difference correlations is ultimately reached, and how far 
various assumptions made in the course of the fundamental demonstration apply 
in dealing practically with actual statistical data. One of the most important 
assumptions made if there be n values of the variates is that arising from the 
reduction in the number of values as we take the means which occur in successive 
differences, and a like assumption is made in the case of standard deviations. 
Thus for example: 


Ue ede © ee 
{8(X) =X, 


n-1 
but = S(X 5 — X54) = (X1— Xn) /(n— 1), and will not be sensibly zero, although 
Te ke cal 


it is assumed to be, unless n be very large. Similar remarks apply to the sums used 


‘ erat 2 : ey ee 1 n a = 
in the standard deviations, i.e. we assume in the proof S i) = oar js (Xe a) 
eal coe float 


Ultimately with the mth differences we come in the proof to relations of the 


type 
1 


NES iit 


3 my g (X 5) 
1 Ny 


and 
nM = 1 id Van) 
re - § (X,). 
il 


nm—-m 4 ? 
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Now such relations will undoubtedly be very approximately true, if the X’s 
are random variates uncorrelated to each other, and provided m is small compared 
with x. These conditions seem amply satisfied when we proceed to fourth or sixth 
differences in barometric pressures, taken, say, over ten or twelve years; the 
addition of four or five daily pressures will hardly affect sensibly either the mean 
or the standard deviation. But such extensive data, while not only involving a great 
deal of labour in the difference work* are not those which, perhaps, most frequently 
demand the attention of the statistician, whether he be economist, sociologist 
or a student of scientific agriculture. In such cases it not infrequently happens 
that the available data only provide a range of 20 to, perhaps, at most 50 years ; 
and we need to discover whether there is a true relationship between our variates, 
apart from a continuous change in both due to the time factor. At present 
accurate statistics of annual trade or revenue, or satisfactory annual demographic 
data hardly extend at most beyond a period of 50 years. Very often—under 
even approximately like methods of record—we shall hardly have more than 
twenty years’ trustworthy returns. Not only has the method of record been 
changed, but the conditions of transit and trade may have been immensely 
modified and in a manner which we could not suppose to be even approximately 
represented by a continuous function of the time. 


The object of the present paper is to illustrate the theory of the variate 
difference correlation method in its present stage of development on a short series 
of economic data, in order to test what approximation there is in such short series 
to stability, and further how nearly Dr Anderson’s values for the successive 
standard deviations apply to such cases. We have selected as our data ten 
economic indices of Italian prosperity for the years 1885 to 1912, together with a 
“Synthetic Index,” formed by taking the arithmetic mean of the ten economic 
indices referred to. These eleven indices are given by Professor Georgio Mortara 
in an interesting memoir: “Sintomi statistic1 delle condizione economiche 
qd’ Italia” which was published in the Giornale degli Economisti e Rivista di 
Statistica, for February, 1914, and form Tabella I, of that memoir, which we 
here reproduce in part as Table I. The indices in each case are obtained by 
dividing the returns for any year by the means of the returns for the years 
1901—05, inclusive, and multiplying as usual by 100. 


The indices are for returns of (1) Gross Receipts of Railways, (i1) Shipping, 
loaded and unloaded at the ports, (i11) Effective Revenue of the State, (iv) Inter- 
national Commerce, Value of Imports and Exports, (v) Number of postal 
Letters and private Telegrams, (v1) Amount of Stamp Duties, (vii) Savings Banks’ 
Returns, (viii) Impo: tation of Coal, (ix) Gross returns of consumption of Tobacco, 
(x) Returns of Coffee imported. Professor Mortara has drawn attention to the 
very high correlations of these individual indices with each other, and of each of 
them with the “Synthetic Index.” The latter correlation is, however, to a certain 


* A discussion of the correlations of the higher differences in barometric pressures will we hope be 
shortly issued. 


Ce a 
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extent spurious. For if J,, Z;,... J be the individual indices and J, the synthetic 
index, then J,=75(44+ 4+ ... + Ji) and any individual index J,, if there be no 
correlation between such individual indices, would give 


1 a 7 T i! T \2 
7b Us — Ls) La — La) = z9,, 8 Fa — 40) 


I 
9. 
a 


2 I Ti 1 2 2 2 
Ca poss SU; —I; = 100 (Cie Ose eed) 


and accordingly 
1 72 
my 


; Wie 2 : 
Cae toe UG ea aac 


eee) 


co 
Tho 


TABLE I. 


Professor Mortara’s Table of Index Values of Italy. 
Numerical Index (Mean 1901—1905) = 100. 


| 


A | “a » vw | | | S.5 

Paani eee Soul ste Vee eta | ia | 8 | eg | 2 eae 
Year z a ® 3 & oa} as ax Ss ois 2 Opes ae 

eee eo acess! Sees e | O ) Ss |S, | ee ee 

a B a 20 aS ES) | ess a BA 4g | 

} | 
1885 61 |-- 63 78 72 | 38 82 47 53 82 98 67'4 
1886 62 63 |. 79 | 74 40 87 53 52 86 94 69-0 
1887 68 73 | 82 78 42 94 BY 64 88 91 73°5 
1888 Omg ples” 83 =)" 162 43 98 57 69 86 81 72:0 
1889 71 Me | 285 70 45 99 58—| 71 86 78 74:0 
1890 (2 78 | 86 66 46 97 59 | 78 87 81 75°0 
feomaneer2 72 | 85 \| 60 |: 48 | 96 | 60 | vo | 88 | 80 73°1 
1892 lye 7 86 | 64 bl 96 64 69 89 80 745 
1893 "On 970 8 | 64 | 55 95 65 66 90,5], 73 Toes 
1894 72 a 72, 86 | 63 57 93 65 84 89 71 75'2 
1895 73 76 | 89 | 66 60 92 68 a 88 70 759 | 
1896 75 ves 90 | 67 64 94 70 rp So ee Tec 
1897 78 80 90 | 68 | 68 96 73e |e 76 Site (ro 79:1 
1898 81 84 STE aan 183 96 75 | 79 | 89 78 Sp | 
1899 86 88 | 93 | 88 75 96 70. iN 87 \ar91 82 | 86°5 
1900 89 89 | 94 | 91 78 96 83 | 88 92 82 | 88:2 
NOON |) 90" <I, Ol" “96. 1h 92. | 852 || (96."| 87.4 86 | 95 | 92 91-0 
1902 967 |) 99" | 198 95 | 93° | 97 92 | 96 | 97 | 94 95°7 
1903)" | 10 | 103 | 99° | 99 | 103 | 99 99 99 | 99 | 102 | 100°3 
1904 | 105 |-102 | 101 | 103 | 111 | 101 | 107 | 105 | 102 | 103 104°0 
WoOpsaiel0s | 105" 105 | 10k | 0S" i07 | tts | 114 | 106 | 109 108°8 
1906 | 119 | 123 | 108 | 132 | 107 | 115 | 127 | 137 | 109 | 118 119°5 | 
1907 | 126 | 125 | 108 | 144 | 114 | 120 | 144 | 148 |,115 | 125 126°9 
1908 | 134 | 129 | 113. | 139 | 122 | 120 | 155° | 150. | 124 | 132 131°8 
NOOR 139) 40 121 } 149° 132 193, 168 | 165° | 131 | 140 140°8 
1910 | 146 | 146 | 129 | 159 | 140 | 130 | 180 | 166 | 138 | 147 148°1 
HOU \al56 156: | 135 | 167" 1149 \136 [18% | W71 | 144. | 153 155°3 
1912 | 165 | 169 | 188 | 179 | 158 | 141 | 192 | 179 | 151 | 160 163-2 
{ | | | | 

Mean | 94°8 | 116°3| 97°6 | 96°4 | 82°3 | 103°3 | 95:9 | 99-0 /100°6  98°6 96°5 
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which would have a substantial, but spurious value amounting to ‘316 if 
Cp Op Gy If there be high correlation between the individual indices, of 


course, the correlation of each individual index with the arithmetic mean or 
synthetic index will also be high. Thus our Table IV shows it to range from ‘952 
in the case of Coffee to ‘998 in the case of Railways. Possibly a third of this 
correlation may in some cases have a spurious origin. But the individual indices 
are very highly correlated together; only two such correlations are below ‘9, and 
the lower of these two is as high as ‘885. We are accordingly left with fifty 
correlations ranging from ‘885 to ‘997 between the individual indices, and if we 
accept these as true measures, then it is clear that any one of these ten indices 
might be used as a reasonable index of Italian prosperity; it would for practical 
purposes be idle to calculate them all or to table their arithmetic mean. 


But the high correlations found lay themselves open from our present stand- 
point to some suspicion of being solely due to the fact that during the 28 years 
under consideration Italy has progressively increased in population and_ac- 
cordingly the consumption of innumerable goods and the means of interchange 
have all grown together with the time. In other words the correlations we give 
under the heading of “quantities” in each separate section of Table IV are very 
high solely because the individual indices are variates increasing one and all as 
continuous functions of the time. 


The material therefore seems especially suited to the application of the 
variate difference correlation method. For example, the correlation between the 
indices for tobacco and savings is ‘984; are we to interpret this to signify, that, 
if there are large savings this means that much will be spent on tobacco? Or, 
is this high correlation simply in whole or part spurious, merely indicating that 
both savings and consumption of tobacco increased markedly with the time ? 
Actually the correlation of first differences drops from ‘984 to ‘766, that of 
second differences is negative if insensible, while from there onwards it steadily 
increases negatively, till with the sixth differences we reach —‘431, which seems 
to indicate that, when time has been eliminated, expenditure on tobacco in any 
year means less money saved. Again the coffee and tobacco indices appear very 
highly correlated, ‘955, but by the third ditference correlation we have reached 
about a third of the relationship, ‘319, which is scarcely altered in the sixth 
difference correlation, 326; we may assume therefore that there is probably a 
moderate “organic” relationship between the expenditures on coffee and tobacco, 
but the association is nothing like as close as would be suggested by the corre- 
lation of the raw indices. 


The work has been done in the following manner: The successive differences 
of the indices up to the sixth were found. The means and standard deviations 
of these differences were calculated, and the correlations were then worked out 
in the product-moment manner. This involved the very laborious work of 
determining 385 coefticients, and then to these coefficients were added the 
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probable errors as found by Dr Anderson’s formulae. These probable errors are 
of course those of the correlations of A™X and A”Y, and will not be the correct 
values for the probable errors of the correlations of A™# and A™y, until A" = A”"X 
and Amy =A”Y, ie. until m is sufficiently great for ¢ to have been eliminated. 


’ 


Further their accuracy depends on the vanishing of the means of the differences 
or on the equalities of the sums like 
1 


w= n 
S Xe) ah | . (XG), ete. 


1 
n—1l 


which, while true on the average, will only be approximately true in the actual 


instance if n be large. We give in Table II, the Mean Values of Index 


\ 


Differences. 
TABLE II. 
Mean Values of Indices and their Differences. 

S| = oF ine val, ‘1 7 
| 2 bo ES | i hes (3s $8 
ees | 8 ok Ss 2. “ip 2 P ey 3p fet 
ee ee eee ase 2 | 3s i.e |eee8 
seles les Gales |ae | sa) °.| Ss | 8 |e 88 

ee fs Et et | | @ <5 

us zal : : 

Quantity ... | 94°8 | 116°3| 97°6 | 96:4 | 82°3 | 103-3 | 95-9 | 99-0 100°6 | 98°6 96°5 
Ist difference |— 3-85 |— 3-93 | — 2:22 | 3-96 |— 4-44 | — 2°19 | —5:37 | — 4°67 | — 2°56 | — 2°30 — 3°55 
prides 5, — 35/— 50|— 08|- -38/— 27) -00/+ 04/— -35|- ‘12/- -42) - -24 
3rd, + 16/+ -28)4+ -20}— 08] -00/+ 12/-— -08|+ -40)/— 12) ‘00; + ‘09 
4th 3 — 33/-— 88/-— ‘13}-1:17 00)/— -28)/+ -04]/-— ‘79|- ‘16;-— 42} — ‘40 
5th ” + 67/4226 + -26/+2°52/4 30 00 |— °52)/41°87)+ 22/4 87) + °83 
6th 5 SCD Ne 2L00 F008) 735) — 05 |4+2°00;-— °95)-168) -—1°10 


It will be seen from this table that the means of the differences are far from 
zero even when we have reached a difference for which we may suppose the time 
to have been eliminated. This arises from the smallness of the series dealt with 
and shows us that we ought not to anticipate more than a rough accordance with 
theory, or only an approximate steadiness, for sums like 


may grow less and less steady as m increases, 


Similar considerations apply to the standard deviations of the differences. 
These will not at first obey Dr Anderson’s formulae because they are the values 


for om, and T amy? and when we have taken m sufficiently high for 


and o i to be theoretically equal to c,,,, and o,n5, 


Tamy Am 
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and accordingly the correlation should have begun to be steady, then some failure 
to obey Dr Anderson’s formulae will arise, because the means of the differences 
are not truly zero and equalities of the type 


el n 
n—-1 


Use 


=i 1 1 
8 Ce) — ar s (X2;.5)) ete 
will not be satisfied when n is relatively small. 


Dr Anderson gives the value of omy I terms of o°,, but we do not of course 
know o?,, which will be very different from o?,, and can only practically be found 


from the value of o? itself, after that value has become equal to Oo my» Le. after 


Ar 
steadiness has set in. In order therefore to test the formulae we have formed the 
ratio of 

O AmylF jm, from m=1 to 6. 


: Dies : ; - 
This equals seater in Dr Anderson’s formulae for Can Cention and therefore 


we have a good measure of the approach of Aw to AX, or of the growth of 
steadiness as apart from the correlations. The following Table III gives the 
values of the ratios of the squares of the standard deviations, theoretical values, 
actual values and the mean value for each of the differences of the ten 
individual indices. 


TABLE III. 
; 2 
Values of o?4my/O%4m-1y and their approach to mer ; 
S234 | fa Mean of 10 
§ se 3 , | oS! ¢ 8 2 ewe a 8 & Index 
lm | 2-2 a8 e a S gq a £2 g 3 2 BSI Difference 
3 8 aa on] ate | ag S = s Pe S 2 = Standard 
rch ean a en) 25 a n a Deviation 
= & Ratios 
1 012) °012 031] -019) °038} -009]) *040) -O10} °035 | °022) -036 | 025 
2 | 3 705 | °708 | 1°834| °763|1°720| -799 | °585 350 | 2°074| °352| °843 1:003 
3 | 3°333 | 37107 | 2°816 | 3093 | 2°124 | 3°032 | 1°959 | 1°660 | 2-214 | 3°075 | 2-213 | 3°307 2°549 
4 |3°5 3°167 | 3°128 | 3°174 | 2°747 | 3°213 | 2°597 | 2:008 | 3°106 | 3°379 | 3:025 | 3-619 3°000 
5 | 3°6 3143 | 3°449 | 3°189 | 3:020 | 3°104 | 3°010 | 2°328 | 3°275 | 3°580 | 3-117 | 3°701 3°177 
6 | 3°667 | 3°149 | 3°711 | 3°195 | 3:164 | 2°881 | 3:208 | 2°499 | 3°455 | 3°682 | 3°101 | 3°791 3°269 


It will be clear that until we reach the ratio of the square standard deviations 
of the third and second differences, there is no general approach to steadiness. 
After m=8, however, for m= 4, 5 and 6, the ratio of the values for the mean of the 
series of individual indices to the theoretical value is ‘86, ‘88 and ‘89, respectively. 


Thus, there is increasing approach to agreement in the observed and theoretical 
values, but the approach is slow, and we believe that there is greater steadiness than 
is really indicated by this test. The source of this apparent unsteadiness lies we 
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think in the relative largeness of m compared with n (Le. at a maximum 6 as 
compared with 28), rather than in our not having taken sufficiently high 
differences. 


We now turn to the correlations. These are given in Table IV, the actual 
values of the standard deviations of the quantities and their differences being 
recorded along the diagonal cells, while the other cells contain the correlations of 
each pair of variates and of their successive differences. 


We will now consider these correlations in detail. 
(a) Synthetic Index (Arithmetic Mean) with Individual Indices. 


We see at once that the Synthetic Index is highly associated with Shipping 
(>°85), with Importation of Coal (? >°75), and International Commerce (? < ‘68), 
and fairly highly with Revenue (c. ‘55). On the other hand the sixth difference 
correlations with Post (c. 15), with Stamp Duties (c. ‘24) and with Savings (> °23) 
are all such (i) that they might well have arisen from the spurious element in the 
Synthetic Index correlations, and are all less than their Andersonian (steady 
value) probable errors. Almost the same may be said of the Railway Index ; it 
is not beyond suspicion of being spurious, and is scarcely significant having regard 
to its probable error. The Consumption of Coffee is also not very closely associated 
with the Synthetic Index ; it is only about twice its probable error (427 +°205), 
and a good deal of its value may be spurious. Further in the case of both 
Coffee and Railways, the correlations are still falling between ‘04 and ‘05 for each 
difference. The last individual index remaining is that for Consumption of 
Tobacco and although the correlation of sixth differences is not really significant 
it is negative (— ‘247 + 235), and is exhibiting a steady negative rise. 

Stripped therefore of the common time factor the Synthetic Index will be seen 
to be no very appropriate measure of trade, business activity, and spare money for 
savings and luxuries. With Post, Stamp Duties and Savings, it has probably only 
a spurious relationship, expenditure on railways has little influence, that on 
luxuries is very slightly significant, or indeed in the case of tobacco negative. It 
is, however, closely related to variations in external trade, i.e. imports (including 
coal), to exports and shipping and to effective revenue. It appears to us that a 
suitable general index of prosperity, which will distinguish between a continuous 
growth in all factors with the time, and favourable and unfavourable fluctuations 
from this growth, can only be obtained, when there has been far more ample 
study of the associations of individual indices among themselves, and of these 
indices after they have been freed from the time factor, i.e. of associations 
between high difference correlations. From this standpoint the study of General 
Index theories is at present in its infancy. 


(6) Railway Index, This index is very noteworthy in the nature of its 
associations after removal of the time factor. We have reached a steady 
correlation (c. 62) with Shipping, but beyond this no values of first class im- 
portance appear. The relation of Railways and Revenue after falling practically 
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to zero, now stands at something greater than -42 and might rise higher, but 
the relation to International Commerce as a whole is zero, which suggests that 
the goods imported and exported are not in the bulk carried by rail. Further 
althoagh the final value of the Railway and Post correlation is scarcely sensible 
(— 214 + :259), it has been continuously negative from the second difference, and 
thus suggests that increased expenditure on the post means lessened profit for 
the railways. This might be interpreted in two ways: (i) that business con- 
ducted by post or telegram lessens rail intereommunication by person, or (il) that 
in the case of state-railways, there is not an increased profit to the railways from 
carrying larger mails. But still more remarkable are the negative correlations of 
Stamp Duties, Savings, Tobacco and Coffee with Railways; none of them are very 
large, and all but savings, perhaps, of the order of their probable errors. But 
taken as a whole they suggest that when the Italian spends little money in going 
about, then he saves more, or spends more on such luxuries as tobacco and coffee. 
Lastly we have the Coal Index. It might be supposed that a year with great 
coal importation would signify great railway activity, and this-is the judgment 
which would be made from the raw correlations of these variates. But the actual 
facts are exhibited in a correlation still falling at the sixth difference and hardly 
significant having regard to its probable error. The inferences formed must be: 
(i) that imported coal is used largely at the ports of disembarcation or travels 
inland by other than railway transit, (11) that the imported coal is largely used on 
the railways themselves and that its cost is a heavy tax on their resources. 

(c) Shipping Index. As we might anticipate this is highly correlated with 
(i) Railways (c. 62), (ii) Revenue (c. °75) and less highly but very significantly with 
(111) International Commerce (c. 54) and (iv) Coal (c. 58), but it appears to have 
no relation whatever with Post, Stamp-Duties and Savings, and when we come to 
luxuries, their importation is clearly not a factor of shipping prosperity. Neither 
in the case of J’obacco nor of Coffee are the correlations really significant ; with 
the former we have an increasing negative correlation and with the latter a 
decreasing positive one already below its probable error. Thus we see that 
neither directly by bulk of importation nor indirectly by immediate increase of 
consumption, does a rise of shipping mark significant rises in the use of luxuries 
such as Tobacco and Coffee. It would be of interest to ascertain whether in- 
creased consumption of luxuries does not rather follow than accompany favourable 
trade fluctuations. 

(d) Revenue Index. This index as we might expect is fairly highly cor- 
related with Shipping (c. °75). It has relatively small relation to Railways 
(422 +206) at least at the sixth difference and a somewhat similar value 
(c, 42 4°20) for Coffee. Thus the suggestions arise that revenue is but little 
produced by the railways and that coffee is not a very large factor of the custom 
dues. It is astonishing to find, however, that Post, Stamps and Savings have 
negative correlations with Revenue of —°888 +°213, — 255 + 234 and —:154 4+ :244 
respectively, which, if scarcely significant, have been in each case for several 
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differences persistent in sign. Even the correlation with Tobacco is small, falling 
and insignificant (< 115 +247), and that with Coal which might be supposed to 
be high as marking good trade times is hardly significant although apparently 
rising (? > -270 + 232). Lastly the correlation of Revenue with International 
Commerce is again small, falling, and insignificant (< ‘214 +°259). Thus Revenue 
or the “entrato effectivo dello stato” seems to provide an index which has little 
valuable relation to any other characteristic of prosperity beyond shipping. 


(e) International Commerce. Here we find no single final individual index 
correlation greater than °54, which is that for Shipping. The next most important 
correlations are with Post (>°47) and Stamp Duty (c. 46). With Railways the 
correlation is zero, and with Revenue also falling and insignificant. With Savings, 
Coal, Tobacco and Coffee the correlations are all insignificant; in fact in the last 
three cases not only are the values less than their probable errors, but they are 
still falling. It is thus clear that in Italy the total of Exports and Imports is 
no measure of all-round prosperity, they do not immediately increase either 
savings or the consumption of luxuries. 


(f) Post and Telegrams. Here we have the lowest series of correlations we 
have yet reached, Post values have no significant relation to fluctuations in 
Railway (c. —:20 + °24), to Shipping (— 059 + 249), Stamp Duties (—027 + °250), 
Coal (— 050 + 250), Tobacco (+ 108 + 247) or Coffee (— 183 + :246) Indices. It 
is significantly correlated only with International Commerce (> + ‘47 +'19) and, 
perhaps, significantly with Savings (+ 336 +°222) but negatively with Revenue 
(c. —'38 +:21). In short the number of letters and telegrams in Italy is hardly a 
mark of any other favourable fluctuation in prosperity, beyond International 
Commerce. 


(g) Stamp Duties. This Index is correlated positively and significantly with 
International Commerce (c. +'46 +°20) and positively, and doubtfully with Savings 
(c. + °85 +°22). It is correlated insignificantly and negatively with Raalways 
(—'261 +233), Shepping (— 009 + -250), Revenue (—'255 +°234), Post (—'027 +250), 
and Tobacco (—'129 + 246); it is correlated positively and insignificantly with 
Coal (+ (052 + ‘250) and Coffee (+ °222 +238). Thus again freed from continuous 
time changes, fluctuations in the Stamp Duty Indew are of small value as a 
measure of contemporaneous general prosperity. 


(h) Savings Bank Index, There are practically only two correlations of any 
importance with Savings and these are both negative, namely those with Railways 
(— "431 + 204) and with Yobacco (—°431+4°204). Hence it would appear that 
when the Italian people is in a saving mood, it spares on transit by rail and on 
the consumption of tobacco, and when it expends on these luxuries, then it does 
not save. Savings have small and possibly not significant correlations with Post 
(> +°33 4°22) and Stamp Duties (< +'353 + °219), and insignificant and positive 
correlations with International Commerce (>+°27 + °23), Coal (> +19 + °24) 
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and Coffee (c.+°05 +°25); they have insignificant negative correlations with 
Shipping (< — 03 4°25) and Revenue (< —‘15 + °24). 


Savings are thus—apart from continual time change—no very satisfactory 
measure of general prosperity, and a fluctuating increase is usually accompanied 
by a reduction of luxuries. 


(t) Coal Index. The importation of coal has little relation to any factor of 
prosperity besides Shipping (c.+°58 4°17). With Railways the correlation is 
not quite double the probable error and the value, even at the sixth difference, 
appears still falling. The correlation with Revenue only just exceeds the probable 
error (+ '270 +232). With International Commerce (+171 + °243), Stamp Duties 
(+ 052 +°250), Savings (+ 196 +241) and Coffee (+ 152 + °245) the correlations 
are less than their probable errors, small and in some cases still falling. With 
the Postal Index, the correlation is negative, insignificant and falling. Alone in 
the case of the Tobacco Index does the correlation appear to be nearly as 
significant as in that of Shipping, but it is negative and increasing* (—°514 + 184), 
while in the case of Shipping it was steady. It is singular to find that Coal, the 
increased import of which should mark increased industrial activity, is, beyond 
the naturally influenced Shipping, alone effectively associated with the con- 
sumption of Tobacco. 


(j) Tobacco Index. This is of considerable interest as marking the association 
of indices of trade prosperity with the consumption of a luxury. With four 
exceptions J'obacco is negatively correlated, although often insignificantly, with 
the other indices. Revenue (+°115 + °247), International Commerce (+ °015 + ‘250), 
and Post (+ °108 +:247) are all positive, insignificant, and in the first two cases 
still falling. The correlation with Coffee is positive and might, perhaps, be 
significant (+°326 +°224), but it appears to be still falling. With Coal and 
Savings there are probably significant negative correlations (—°514+°184, and 
— 431 +°204 respectively); with Railways (— ‘243 + ‘236), Shipping (— ‘271 + °229) 
and Stamp Duties (— ‘129 + ‘246) there are insignificant negative correlations, but 
they tend to confirm each other in sign. Thus we see that the consumption of 
tobacco can hardly be considered as a measure of general prosperity; it appears 
to be greatest when trade conditions are unfavourable, and in particular when 
savings are least and manufacturing conditions as measured by the importation of 
coal are slack. The result suggests the pipe of the unemployed at the street 
corner, rather than the increased expenditure of the fully occupied artisan. 


(k) Coffee Index. 'This is another luxury and the results are very similar. 
There appears a significant correlation with Revenue (+ °400 + 210), which might 
easily be explained, and there is a falling but possibly significant correlation 
with Tobacco (+326 +°'224), With all other indices the relationships are 


* It is, perhaps, hard to believe that so much smuggling could be carried on in colliers, that it would 
seriously affect the profits of the tobacco monopoly ! 
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insignificant. Railways (—‘204 +240), Shipping (+ °2382 +°237), International 
Commerce (+142 +245), Post (—'133 4246), Stamp Duties (+°222 + 238), 
Savings (+°046 +250) and Coal (+°152+°245). Apart therefore from the 
general increase of consumption with the time, during which time the general 
prosperity of the nation has increased, it would not appear that the consumption 
of a luxury has any organic relationship to prosperity. We do not find that a 
favourable trade fluctuation is associated with increased consumption of luxuries. 
In fact the suggestion arises that in the case of tobacco the consumption may be 
greater in a period of depression. 


Conclusions. While we lay no special stress on any of the results suggested 
by the difference correlations above studied—far more intimate economic know- 
ledge of Italian affairs and methods of measurement would be requisite—we yet 
venture to insist on one or two general considerations. 


The very superficial statements, so frequently met with, that such and such 
variates, both changing rapidly with the time, are essentially causative will 
doubtless cease to have any scientific currency, directly the method of variate 
differences is fully appreciated. We shall no longer assert that the fall of 
the phthisis death-rate can be off-hand causatively associated with the con- 
temporaneous rise in the number of persons dying in institutions, or that the 
increased expenditure on luxuries is necessarily a measure of increased national 
prosperity. 


If we turn as in the present paper to the actual correlations of the indices 
themselves, we find in every case an arid and scarcely undulating waste of high 
correlation. No one can obtain any nourishment whatever from the statement 
that the Tobacco Indez is correlated with the Revenue Index to the amount of :983 
and with the Suwings Bank Indeaw to the extent of ‘984! The organic relationship 
between these variates is wholly obscured by the continuous increase of all three of 
them with the time. But when we proceed to sixth differences and see that the 
consumption of tobacco has little, if any, relation to revenue, and is associated 
substantially but negatively with savings, we seem to touch realities, and realities 
of some worth. Again what can we learn, if we are told that the Shipping Indes is 
correlated to the extent of ‘99 with both the Revenue and the Savings Bank Indices? 
We might imagine, that increase of shipping was not only the primary cause of 
increase in Italian revenue, but also the essential origin of any increase in the Italian 
peasant’s and artisan’s savings! An appeal to the variate ditference method shows 
how fallacious such imaginings would be! An examination of the sixth difference 
correlations shows that while prosperity of the revenue is closely associated with 
trade as measured by shipping (77), the correlation is not nearly perfect; on 
the other hand there appears to be no significant organic correlation at all 
(— 154 + ‘244) between the prosperity of the revenue and the savings of the 
Italian populace. As we have noted a knowledge of local conditions and methods 
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of reckoning quantities might enable us to put other and, perhaps, more luminous 
interpretations on our results. But there can be small doubt that to proceed from 
the actual correlation of such indices to the correlations of their higher differences 
gives the feeling of clearing away the sand of the desert, and reaching all the 
ordered arrangements of an excavated town below; the slight undulations of the 
waste above are really fallacious, and enable us to appreciate nothing of the 
actual topography of the city. 


The method is at present in its infancy, but it gives hope of greater results 
than almost any recent development of statistics, for there has been no source 
more fruitful of fallacious statistical argument than the common influence of the 
time factor. One sees at once how the method may be applied to growth problems 
in man and in lower forms of life with a view to measuring common extraneous 
influences, to a whole variety of economic and medical problems obscured by the 
influences of the national growth factor, and to a great range of questions in 
social affairs where contemporaneous change of the community in innumerable 
factors has been interpreted as a causative nexus, or society assumed to be at 
least an organic whole; the flowers in a meadow would undoubtedly exhibit 
highly correlated development, but it is not a measure of mutual causation, and 
the development of various social factors has to be freed from the time effect, 
before we can really appreciate their organic relationships. 


In the present paper we have dealt only with very sparse “ populations” (only 
28 values of the variates), but this has enabled us to consider not only a very 
large number of correlations, but to see the practical influence of terminal con- 
ditions on our theory. This may we think be summed up in the statement that 
the Andersonian formulae for the standard deviations will hardly in many practical 
cases be more than very roughly approximated before the size of the population 
becomes too small to make the deductions reliable. Further in most cases our 
difference correlations have hardly even with the sixth differences reached a steady 
state. Possibly they have done so in the cases of Rail and Shipping, Shipping 
and Post, Shipping and Coal, Revenue and Post, International Commerce and 
Stamp Duties, International Commerce and Savings, Savings and Coffee, and in 
one or other additional pair. But in the great bulk of instances there is still a 
more or less steady rising or falling appreciable in the difference correlations, and 
all we can really say is that the final value, the true ryy, will be somewhat 
greater or less than a given number. From an examination of the actual 
numerical working of the correlations, it appears to us that the terminal values 
are in the case of these short series of very great importance. It is further clear 
that the theory as given by “Student” depends upon certain equalities which 
are not fulfilled in practice in short series. We await with much interest the 
complete publication of Dr Anderson’s work, and hope to find a fuller discussion of 
the allowance to be made in short series for the influence of the terminal state of 
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affairs* on the steadiness of the series and on the approach to the standard- 
deviation formulae. But apart from these lesser points, our present numerical 
investigation has convinced us of the very great value of the new method of 
Variate Difference Correlations. 


* For example if we measure X from its mean, 
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for n large; but for n small as above such a relation as o®, y=20%, and the similar but more complex 
relations of the standard deviation formulae for the higher differences need not hold for any individual 
case, and thus the steadiness of the difference correlation series, and the approach to the Andersonian 
formulae are very far from attained. 


AN EXAMINATION OF SOME RECENT STUDIES OF THE 
INHERITANCE FACTOR IN INSANITY 


By DAVID HERON, D.Sc.* 


In the last few years a number of studies of the inheritance factor in insanity 
have been published in America, Germany and England. The value of investigation 
of such a topic cannot be overestimated. We are quite certain that the prevalence 
of insanity is not falling; many of us indeed believe that the statistics suffice to 
demonstrate that it is substantially increasing, and that we can attribute this 
increase not in the first place to the intenser strain of modern life, but to the 
greater power of modern treatment to check or temporarily cure attack, and thus 
allow wider possibility of reproduction to members of affected stocks. Indeed the 
problem seems closely associated with an essential difficulty of modern civilisation, 
the greater protection of physically and mentally degenerate stocks unaccompanied 
by any adequate limitation of their thereby increased power of procreation ; the 
inheritance factor thus tends to aid the relatively greater survival of the socially 
unfit. The studies we have referred to would be of great importance from this 
aspect of eugenics if (i) the data were collected without conscious or unconscious 
bias, and (ii) the inferences drawn from them followed logically from the data thus 
collected. 


Unfortunately it is not only in the interpretation of statistics that adequate 
training is required. It is equally important that in the actual collection of them 
we should proceed, not only free from the bias which arises from the hurried 
acceptance of dogmatic theories of heredity, but what is often still more needful, 
free from the bias which is almost certain to waylay our progress, if we have not 
initially considered with trained insight the fallacies which may result from our 
method of recording or even tabulating our material. The day of the amateur in 
svience is gone; no one now pays any attention to men who propound elaborate 
atomic theories or stellar hypotheses, without having had preliminary training in 
physical or astronomical science. There are still, however, some who appear 
willing to accept the statement of statistical data or the inferences drawn from those 


* This paper formed the second portion of a lecture given at the Galton Laboratory on March 3, 1914. 
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data by men who have clearly had no adequate training in statistical science. The 
craniologist, the anthropologist, even the biological student of heredity and evolu- 
tion are recognising that a statistical training is needful for the true interpretation 
of many of the facts in their special fields of research. The physiologist still 
appears to believe that he can deal with the average effects of diverse dietaries or 
the pathologist with the “ mass-phenomena” of the hereditary factor in insanity 
without any training in statistical method. A physicist might just as logically 
assume that without mathematical training he could give an adequate mathe- 
matical account of a physical phenomenon, or a cosmic theorist suppose that he 
was effectively furnished for astronomical research by the perusal of a popular 
primer on the stars! The statistical calculus cannot be mastered by any easier 
road than the differential calculus, or, to put a more apt illustration, statistical 
training is as needful a preliminary to the handling of statistics, as time spent 
in a physiological laboratory to the effective handling of tissues. In twenty years 
it will be unnecessary to insist on these points, they will be universally recognised 
in the courts of science; but at present it is not only necessary to reiterate 
unpleasant truths, but to emphasise their validity by illustrations which bring 
home forcibly to scientist and layman alike the danger of amateur statistical 
handling. To state that a man is in error is not sufficient, if he continues time 
after time to repeat his assertions, apparently under the belief that incessant 
repetition will convince the world of the value of his theories. 


In the case of the inheritance factor in insanity we are not dealing with any 
purely academic question of science. We are up against one of the most difficult 
problems of modern life, where true advice is of urgent importance to the nation 
as well as to the individual. It is not only the medical man but the layman who 
seeks guidance in the question of the marriage of members of insane stocks, and a 
laboratory like the Galton Laboratory knows how often advice on such points is 
sought. It is disheartening when help is rendered to the seeker to be faced with 
the criticism: “ But Professor says I may marry if I take a wife of sound 
stock,” or “Dr recommends marriage, although my father was insane, because 
I am over twenty-five and still sane myself.” When teaching of this kind, arising 
solely from false interpretation of defective data, is spread widecast in a dozen 
different papers or journals, it is not sufficient to issue a brief statement of its 
futility. It is needful to give it the coup de grdce by a more lengthy criticism of 
its fallacies and their illustration in a form more likely to impress the imagination. 
The attempt is made in this paper to deal with only one of the authors, who have 
contributed fallacious eugenic rules to those seeking knowledge on the influence of 
the hereditary factor in insanity. 


In a long series of papers Dr F. W. Mott, Pathologist to the London County 
Asylums, has stated that when the children of insane parents become insane, they 
do so at a much earlier age than did their parents, and on the basis of this assertion 
he has drawn some very sweeping conclusions for practical conduct. Thus in the 
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British Medical Journal of May 11, 1912 (p. 1060), he states that “this signal 
tendency of insane offspring to suffer with a more intense form of the disease and 
at an early age, as shown in the above figures and tables, is of great importance 
for the following reasons: first, it is one of Nature’s methods of ending or mending 
a degenerate stock; secondly, it is of importance to the physician, for he can say 
that there is a diminishing risk of the child of an insane parent becoming insane 
after he has passed 25, a matter of great importance in the question of marriage ; 
thirdly, it is of importance in connection with the subject of social surgery of the 
insane, for when the first attack of insanity occurs in the parent the children for 
the most part have all been born....Sterilization would therefore be applicable to 
relatively few parents admitted to asylums.” 


Put briefly, Dr Mott’s views are that in “Antedating” or “ Anticipation,” in 
this alleged tendency of the offspring to become insane at any earlier age than 
their parents, we have Nature’s method of purifying degenerate stocks, that the 
children of insane parents who are still normal at the age of 25 may safely marry*, 
and that it is useless to take any special measures to limit the reproduction of the 
insane since nearly all their children are born before the onset of insanity. 


These conclusions, if proved to be correct, would be of the utmost importance 
to the Eugenist. If the Law of Antedating or Anticipation really acts in the 
way Dr Mott has suggested, then it would seem to be unnecessary to take any 
special Kugenic action in the case of the insane and indeed the “ Law” has already 
been used in support of this view. Thus in a leading article in the British Medical 
Journal+, which deals with Dr Mott’s work, it is stated that “This intensification 
of mental disease in the young—this ‘ anticipation’ as it is called, which is one of 
Nature’s methods of ending or mending a degenerate stock, is specially important 
in connection with sterilization, as the figures given by Dr Mott show that when 
the first attack of insanity occurs in the parent the children have for the most part 
all been born. Sterilization, therefore, would be applicable in relatively few 
cases. 


It is at least obvious that when views such as these are taken of the “Law of 
Anticipation,” it merits the most careful examination. Let us consider, then, first 
of all, Dr Mott’s presentation of the case for anticipation. For some years past 
Dr Mott has been engaged in the collection of cases in which two or more members 
of a family are or have been resident in London County Asylums, and has noted 
wherever possible the age of onset of the insanity. Information was thus obtained 
regarding 217 pairs of father and offspring, and 291 pairs of mother and offspring 
and the results are summed up in the following table. 

Thus in comparing the age at onset of insanity in father and offspring, we find 
that among the fathers only 1:4°/, became insane before the age of 20, while among 
the offspring the percentage was 26°2. These figures are also shown graphically in 


* See for instance Problems in Eugenics, p. 426. 
+ May 11, 1912, p. 1089. 
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Figs. 1 and 2*. Here the horizontal scale represents the age of onset in 5-year 
groups—the vertical scale the percentages of cases occurring in each age group. 
TABLE I. 


Percentages of Cases whose First Attack of Insanity occurred 
within Various Age-periods. 


Age-periods Father | Offspring | Mother Offspring - 
Percent.|} Per cent. | Per cent. Per cent, 

Under 20 years ig 26°2 0°6 27°8 

20—24 years... 0-4 18-0 374 15°7 

pting 1-4 | 18-0 44 18-2 \ Adolescence 
30—84 ,, 9°6 13:0 4s) 13°4 

35—39 ,, 11°5 73 9-2 10°0 

40O—-AL,, 9-2 6-4 10°3 58 

4I—49,, 14:3 6:0 12°0 3'7 | Involutional 
50—54,, 17°5 0-9 12°3 2°4 t period 
55—59 13°8 3°7 14:0 Py) 

60—64 ,, 10°1 — 11°6 1°3 

65-—69 _,, 5:0 _- 8°8 — 

70—74 ,, 46 0-4 31 — 

75—79 ,, 0-4 _- 1:3 = 

80 - 0:4 — 0°6 — 


I have been obliged to follow Dr Mott in treating the “under 20” group as a 
5-year group as otherwise my diagrams would bear no resemblance to his, but this 
procedure is far from satisfactory when such a large proportion of the cases in this 
group are congenital cases in which the age of onset should be taken at 0 years. 
The tables and diagrams show that among the parents more than half the cases 
occur after the age of 50, while among the offspring, more than half occur before 
30, and this is taken to prove that there is Anticipation or Antedating in 
Insanity. 


This will perhaps be made more evident if the percentages of those who became 
insane before the age of 25 are given in each case. Among the fathers, 2°/, and 
among the mothers, 4°/, became insane before the age of 25. Among the off- 
spring, on the other hand, the percentage is 44. Another way of looking at the 
matter is to take the average age of onset of insanity in each case. Dr Mott 
gives a Table showing these averages but unfortunately has omitted the congenital 
cases so that the extent of anticipation is considerably under-estimated, and the 
form in which the data are given does not permit of an accurate calculation of the 
actual averages. From the information given it appears, however, that the average 
age at onset of insanity among the parents is about 50 years, among the offspring 
about 26 years, showing an anticipation or antedating of some 24 years. 

* I am very grateful to Miss H. Gertrude Jones, the Hon. Secretary of the Galton Laboratory, for 


the diagrams which illustrate this lecture. 
46—2 
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Now these conclusions, if satisfactorily demonstrated, would obviously be of 
the highest importance, but they were immediately challenged by Professor Karl 
Pearson in a letter which appeared in Nature of November 21, 1912 (p. 334). 
Professor Pearson’s letter is as follows: 


On an Apparent Fallacy in the Statistical Treatment of “ Antedating” in 
the Inheritance of Pathological Conditions. 


The problem of the antedating of family diseases is one of very great interest, and is likely 
to be more studied in the near future than ever it has been in the past. The idea of antedating, 
i.e. the appearance of an hereditary disease at an earlier age in the offspring than in the parent, 
has been referred to by Darwin and has no doubt been considered by others before him. Quite 
recently, studying the subject on insanity, Dr F. W. Mott speaks of antedating or anticipation 
as “Nature’s method of eliminating unsound elements in a stock” (“Problems in Eugenics,” 
papers communicated to the First International Eugenics Congress, 1912, p. 426). 


Iam unable to follow Dr Mott’s proof of the case for antedating in insanity. It appears to 
me to depend upon a statistical fallacy, but this apparent fallacy may not be real, and I should 
like more light on the matter. This is peculiarly desirable, because I understand further 
evidence in favour of antedating is soon forthcoming for other diseases, and will follow much the 
same lines of reasoning. Let us consider the whole of one generation of affected persons at any 
time in the community, and let , represent the number who develop the disease at age s, 
then the generation is represented by 


Noy Nyy Ng_ vos Ngys0- N00, Say. 


Possibly some of these groups will not appear at all, but that is of little importance for our 
present purpose. 


Let us make the assumptions (1) that there is no antedating at all; (2) that there is no 
inheritance of age of onset; thus each individual reproduces the population of the affected 
reduced in the ratio of p to 1. Then the family of any affected person, whatever the age at 
which he developed the disease, would represent on the average the distribution 


Po, PN, Pla, +++ PNs5 +++ PN100- 
The sum of such families would give precisely the age distribution at onset of the preceding 
generation. 


Now let us suppose that for any reason certain of the groups of the first generation do not 
produce offspring at all, or only in reduced numbers. Say that g, only of the n, are able 
to reproduce their kind; then of the older generation, limited to parents, the distribution 
will be 

Jo ot M121 + GoN2t «e+ YgNgH «+» + Y100%1005 
but the younger generation will be 
D (GoM + G12 + YoNgt 0. + Ye Nat --- + G100%100) (pM +... HNg+... +2100); 
i.e. the relative proportions will remain absolutely the same. 


The average age at onset and the frequency distribution of the older generation, that of the 
parents, will be entirely different from that of the offspring and will depend wholly on what values 
we give to the g’s. If frequency curves be formed of the two generations they will differ 
substantially from each other. This difference is not a result or a demonstration of any 
physiological principle of antedating but is solely due to the fact that those who develop the 
disease at different ages are not equally likely to marry and become parents. 
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A quite striking instance of the fallacy, if it be such, would be to consider the antedating of 
“violent deaths.” Fully a quarter of such deaths in males, nearly a half in females, occur before 
the age of twenty years. Consider now the parents and offspring who die from violent deaths ; 
clearly there would be no representative of death from violence under twenty in the parent 
generation, and we should have a most marked case of antedating, because the offspring 
generation would contain all the infant deaths from violence. 


In the case of insanity, is the man or woman who develops insanity at an early age as likely 
to become a parent as one who develops it at a later age? I think there is no doubt as to the 
answer to be given ; those who become insane before twenty-five, even if they recover, are far 
less likely to become parents than those who become insane at late ages—many, indeed, of them 
considering the high death-rate of the insane, will die before they could become parents of large 
families. Now Dr Mott took 508 pairs of parents and offspring, “collected from the records of 
464 insane parents whose 500 insane offspring had also been resident in the County Council 
Asylums,” and ascertained the age of first attack. As at present advised, it seems to me that 
his data must indicate a most marked antedating of disease in the offspring, but an antedating 
which is wholly spurious. There is, I think, a further grievous fallacy involved in this method 
of considering the problem, but before discussing that I should like to see if my criticism of this 


method of approaching the problem of antedating can be met. 
KARL PEARSON. 


Biomerric LABORATORY, 
University Cotieer, Lonpon, 
November 11, 1912. 


Dr Mott has referred to this letter in his Report for 1912*, but it will be more 
convenient to deal with his reply after we have examined the method by which his 
data have been collected and the use made of the data. Let us consider first of alli 
how the data were obtained. Dr Mott in describing his material says that it 
consists of a collection of cases in the London County Asylums where two or more 
persons are related to one another. Thus Dr Mott has dealt—not with a series of 
complete pedigrees in which every member is included, whether insane or normal, 
but with a series of cases in which two or more members of a family are known to 
have been in London County Asylums. No notice is taken of those who are 
normal throughout their lives and no allowance is made for those who are normal 
at the time the record is made but who may afterwards become insane. 


Do cases selected in this way provide a complete or impartial view of the facts? 
Some of Dr Mott’s own comments on his data throw a considerable amount of 
light on this point. In his Report for 1909+ he says: “From all the Asylums 
I have received valuable reports, but in the case of the older asylums it has been a 
matter of the utmost difficulty to trace the records of so many years back,” and in 
his Report for 1910+ he says, ‘Some of the asylum authorities have gone through 
their case books for a number of years back, but the results have not been 
satisfactory owing to the difficulty of obtaining particulars without a living repre- 
sentative of the family being resident in the asylum—for instance, 110 old cases 

* Annual Report of the London County Council for 1912, Vol. 11. p. 62. 


+ Twentieth Annual Report of the Asylums Committee of the L.C.C., p. 90. 
+ Twenty-first Report of the Asylums Committee of the L.C.C., p. 94. 
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reported from Bexley have been rejected as the relatives in the other London 
County Asylums could not be traced, for no instance has been included unless full 
particulars could be obtained.” 


It is thus clear that not all the cases could be traced and that there was 
special difficulty in tracing the older cases. What is the effect of a selection 
of this kind? A study of the following hypothetical cases may serve to throw 
some light on this point. 

TABLE ILI. 


Anticipation or Antedating in Insanity. Hypothetical Laamples to 
show the Effect of Dr Mott's Selection of Cases. 


| 
First Example | Second Example 
Mother: Born Sod ss we oi te. | 1873 1833 
Married... a ey aay oe 1893 1853 
Became Insane and admitted to Asylum 1913 1873 
Age at First Attack seis ise Pot 40 40 
Died fee oe if ave eee 1914 1874 
Son: Born... ace 900 See Scie oc 1894 1854 
Became Insane ... “ine ies nor 1894* 1914 
Admitted to Asylum... apo oe 1914 1914 
Age at First Attack ... a ne 0 60 


The mothers in those two examples have exactly parallel careers. In each 
case the mother became insane at the age of 40 and only lived one year in the 
asylum. In the first case the son was a congenital idiot but was only admitted to 
an asylum at the age of 20. The age of onset in this case is taken at 0 years and 
the case shows marked “ anticipation.” In the second case the mother also became 
insane at the age of 40, the son not till the age of 60, 40 years after his mother’s 
death. The second example thus tells against the Law of Anticipation. Are 
these two cases equally likely to appear in Dr Mott’s data ? 


In the first case mother and son are in the asylum at the same time and were 
admitted within a year of each other. It is very improbable that the relationship 
would escape notice and such a case is almost certain to be recorded. In the 
second case, however, the son is not admitted to an asylum till 40 years after his 
mother’s death. Even if the family remained in the same area for 40 years after 
the mother’s death, it would obviously be very difficult to connect the histories of 
mother and son. This case, which tells against the Law of Anticipation, is 
almost certain to escape notice. A spurious anticipation or antedating is thus 
inevitable owing to the method of collecting the data. 


It has also been pointed out that Dr Mott has made no allowance for those 
who are mentally normal at the time the record is made but may subsequently 


* Congenital Idiot. 
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become insane, and this introduces further spurious anticipation. Another hypo- 
thetical example will perhaps make this clear. Let us take the case of a mother 
with six children, five of whom have become insane as follows: 


TABLE IIL. 
Children 
Mother eis bal 5 | 

ipl | 2 eat biel wee) 5 6 
Born 506 dot one 1830 1850 | 1852 1854 | 1856 1858 | 1860 
Became Insane ... | 1860 — 1872 1896 1914 1888 1860+ 

| Age at Onset of Insanity | 30 —* 20 | 42 | 58 30 0 
| a 


The extent to which this family would show anticipation or antedating 
would depend very largely on the time at which the record was made as is shown 
in the following table. 


TABLE IV. 

Age of Onset of Insanity in 
Date of Average for | Amount of 
Record Children Anticipation 

Mother Children 
1860 30 0) 0) 30 
1872 30 0, 20 10 20 
1888 30 0, 20, 30 16°7 13°'3 
1896 30 0, 20, 30, 42 23 0 7 
1914 30 0, 20, 30, 42, 58 30 (0) 


If the case were noted in 1860 then the age of onset of insanity in the mother 
is 30 years—of the child 0 years—a clear case of anticipation, and nothing 
would be known of the fact that four other children will afterwards become insane 
and will bring the average age of onset in the children up to 30 years—exactly 
the same as that of the mother. Nor is the record even now complete for if the 
eldest child ever becomes insane, the age of onset in his case must be at least 
64 years and this will further increase the average age of onset in the children. 
It is thus clear that in dealing with incomplete families and ignoring the possi- 
bility that those who are normal at the time of record may afterwards become 
insane, Dr Mott has introduced a further spurious anticipation or antedating. 


If we examine carefully the first pedigree given by Dr Mott at the Eugenics 
Congress}, we see clearly how probably much of the anticipation recorded by 


* Alive, 64 years of age and still normal. 
+ Congenital Idiot. 
+ Problems in Eugenics, p. 413. 
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Dr Mott has arisen. Unfortunately this is the only pedigree for which sufficient 
details have been given to enable its completeness to be tested. The pedigree 
and Dr Mott’s description of it are as follows : 


“A.B., an alien Jew, aged 54, was admitted to an asylum for the first time suffering from 
involutional melancholia ; he has a sister who has not been in an asylum, but, as events turned 
out, bore the latent seeds of insanity. The man is married to a healthy woman who bore him a 
large family ; the first five are quite healthy, then comes a congenital imbecile epileptic (cong.)*, 
then two healthy children followed by a daughter who becomes insane at 23, then a son insane 
at 22, and lastly two children who are up to the present free from any taint. The sister of A.B. 
is married and has a family of ten, seven girls and three boys; one of the females was admitted 
to the asylum at the age of 19, and since this pedigree was constructed a brother of hers has 
been admitted aged 24. Half-black+ circles are insane. The pedigree is instructive ; it shows 
direct and collateral heredity ; it also shows remarkably well the signal tendency to the 
occurrence of insanity at an early age in the children of an insane and potentially insane 
parent.” 


3 Brothers 
6 Sisters: 


23 22 19 
@ = Insane. 
13 children: 9 Alive, 4 Sons, 5 Daughters. 4 Dead. 3 Insane. 


1 
Cong. 


Fig. 8. Pedigree to illustrate the effect of Dr Mott’s selection of cases. 
F. W. Mott: ‘‘Heredity and Eugenics in relation to Insanity.” Problems in Eugenics, p. 413. 


This pedigree was given as above in July 1912, and in an address previously 
delivered before the Manchester Medical Society on Oct. 4, 1911, Dr Mott gave 
the same pedigree, but without any reference to the nephew of A.B. (brother of 
the girl who became insane at 19) who became insane “since the pedigree was 
constructed,” so that this man became insane between 1911 and 1912 and this 
serves to “date” the pedigree. 


Now it should be noted that at least five of the children of A.B. are over 23 years 
of age and up to the present time healthy. But all these children are alive and if 
any one of them afterwards becomes insane, the average age of onset of insanity in 
the children will be raised—and it is clear that the more incomplete the pedigree 
the greater the amount of spurious anticipation. Again Dr Mott states that in 

* This does not agree with Dr Mott’s pedigree which gives the congenital case as the seventh 


instead of the sixth child. 
+ According to our usual custom, they are represented by full black circles in Fig. 3. 
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nephews and nieces the age of onset is earlier than in uncles and aunts. In 1911 
this pedigree gives a case in which an uncle became insane at 54, his niece at 19— 
but one year later a nephew who became insane at 24 has to be added, thus 
raising the average and there are eight more children some at least of whom 
may become insane at later ages. As before the incompleteness of the pedigree 
introduces an artificial and spurious anticipation or antedating. The remedy is 
obvious ; we must only deal with completed families. 


A further fallacy involved in Dr Mott’s method of work must now be noted. In 
directly comparing the age of onset in parent and child, Dr Mott has ignored the 
fact that in the parent the incidence of insanity is for all practical purposes 
limited to the age of 20 and over since cases of congenital defect and of adolescent 
insanity hardly ever marry. Among the general population of asylums, however, 
12°/, become insane before the age of 20 and in Dr Mott’s selected data the 
percentage rises to 27—or more than a quarter of the whole become insane before 
20. This in itself causes a very marked spurious anticipation. As Professor 
Pearson has shown (p. 361 above) if we were to investigate the age at death in 
parent and child from accident or violence, we should find the same spurious 
anticipation. 


There are thus three fallacies involved in Dr Mott’s work. In the first place a 
spurious anticipation or antedating arises from the inclusion in the record of 
families whose history has not yet been completed, for those who become insane 
at late ages in the younger generation do not appear. Secondly, even with families 
whose history is completed, those cases in which the insanity of parent and child 
is contemporaneous are far more likely to be recorded than those in which the 
child becomes insane long after the parent*, and thus the cases which show 
anticipation are more likely to appear in the record than those which tell against 
Dr Mott’s views. Thirdly, by directly comparing parent and child, he has practi- 
cally limited one of the two groups which are being compared to ages at onset of 
over 20 years and has thus obtained further spurious anticipation. 


Dr Mott also lays stress on the appearance of insanity in a more intense form 
in the younger generation. “I have proved,” he sayst+, “that there is a signal 
tendency in the insane offspring of insane parents for the insanity to occur at an 
earlier age and in a more intense form in a large proportion of cases, for the form 
of insanity is usually either congenital imbecility, insanity of adolescence, or the more 
severe form of dementia praecox, the primary dementia of adolescence, which is 
generally an incurable disease.” But we have already seen that Dr Mott’s method 
of collecting his data is such that an enormous preponderance of early cases 
of insanity in the younger generation is inevitable and of course such cases are 
largely incurable. Type of disease is very closely related to the age of onset and 


* Dr Mott states (Archives of Neurology, Vol. v1. p. 82) that ‘the main bulk of the cards (i.e. his 
records), however, refer to parents and offspring admitted to the asylums within the last fifteen years.” 
+ Archives of Neurology, Vol. v1. p. 82. 
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by selecting the latter we can alter the proportion of any particular type of 
insanity. Dr Mott has obtained his material in such a way that, in the younger 
generation, cases of insanity coming on late in life are much less likely to be 
recorded than those which appear in early life, and hence the early cases are in a 
majority, but the change in age of onset, and consequently of the type, is entirely 
spurious and arises solely from the way in which the material has been obtained. 

We can now deal with the reply Dr Mott has made to Professor Pearson’s 
criticisms. In his Annual Report for 1912 (p. 62), Dr Mott says: “ Professor 
Karl Pearson, writing to Nature, November 21, 1912, ‘On an apparent fallacy in 
the statistical treatment of “ Antedating” in the inheritance of pathological con- 
ditions,’ criticises on mathematical grounds the evidence of anticipation. I do not 
feel myself competent to reply to the opinion of such an eminent authority on 
mathematics applied to biometrics, but it does not militate against my conclusions, 
nor explain away the fact that a large proportion of the insane offspring of insane 
parents are affected with imbecility or adolescent insanity; for granting the 
assumption that there is no antedating at all, we might rightly expect the ages 
at onset of insane offspring of insane parents to be comparable with the ages at 
onset of all the admissions to the asylums during the same period*. This is by no 
means the case, for amongst the insane offspring there is a far greater proportion 
atfected early in life, as is shown in the following figures and curves” (they appear 
here as Fig. 4 and Table V). 


According to these figures the onset of insanity among the recorded insane 
offspring of insane parents is considerably earlier than among the general admis- 
sions to asylums, but it has already been shown that this is due to the fact that 
the data have been selected in such a way that the early cases in the younger 
generation are the most likely to appear. Further, if Dr Mott’s argument be a 
valid one, we might also expect the ages at onset of the insane parents of these 
insane offspring to be comparable with the ages at onset of all the admissions to 
asylums during the same period. This is by no means the case as is shown in 
Fig. 5 below (see also Tables I and V). We see here that the insanity of the 
parents comes on at a much later period than among the general admissions to 
asylums and that there is a far less proportion affected early in life. If Dr Mott’s 
method of argument be sound, he has not only to deal with an antedating of 
insanity among the offspring but also a post-dating of insanity among the parents. 
Both are of course spurious and arise from the peculiar selection of the data and 
from the fact that, owing to differential death-rates, the ages at onset of “ admis- 
sions” will never be the same as the ages at onset of the admitted—i.e. the asylum 
population—at any time. 


* ««We might rightly expect”’ these ages to be different, because ‘‘admissions”’ are not the same as 
the population in the country who have at one time or another been insane. The percentages of total 
cases of acute mania, of senile insanity, of congenital idiocy, and of melancholia, who reach the asylums, 
are not the same. The reader has to distinguish between the population of admissions, the population 
of admitted, and the insane population of the country. A sample of the latter may be reached from 
completed family histories, but not from records on admission or from records of an asylum population. 

47—2 
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Diagram to illustrate the Distribution of Age at Onset of Insanity among: 
(1) The Insane Offspring of Insane Parents. 
(2) The Insane Parents of Insane Offspring. 
(3) All Admissions to L.C.C. Asylums. ; 
TABLE V. Percentage Comparison of the Age at time of Onset of Insanity in 
the Insane Offspring of Insane Parents and the General Admissions to 
the London County Asylums. 


MALE FEMALE Toran 
Age at 
Onset of | 4489 direct | 274 insane | 5097 direct | 389 insane | 9579 direct | 663 insane 
| Insanity admissions | offspringof | admissions | offspring of | admissions | offspring of 
during last insane during last insane during last insane 
four years parents four years parents four years parents 
Under 25 20:0 43°8 20°2 44-2 20°1 44:0 
25—84 19°9 27°7 19°9 28-0 19°9 27°9 
35 --4h 21°9 13°8 21-5 16°7 21°7 15'5 
45—B4 Teel 7) 10°2 18°6 7°4 18:2 8°5 
55—64 1353 3°6 12°4 2°8 12°7 3:2 
65—7T4 SPH O-'7 59 0°8 5°8 O'7 
75 45) — 1°6 _- 1135) — 
41 male imbeciles out of 274 offspring 
54 female 389 ” 


”» 
95 male and female , alae 663 
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It is possible to illustrate the various fallacies which vitiate Dr Mott’s conclu- 
sions regarding anticipation by considering the age at death of parent and child. 
I do not know whether it is generally recognised that it is exceedingly difficult 
to get any considerable body of data in which the ages at death of a parent and all 
his children are given, for of course the record is incomplete and biassed until the 
death of the last surviving member, and in some cases to get a complete record we 
must trace the history of a family for over 150 years. George the IIIrd, for instance, 
was born in 1738 and all but one of his 15 children were still alive in 1810, 72 years 
afterwards, and the last surviving son, Duke of Cumberland and King of Hanover, 
did not die till 1851, 113 years after his father’s birth—and this is by no means 
an extreme case. In the material I am about to describe I found one case where 
the interval was 160 years. 


Another difficulty which arises is the tendency in practically all family 
histories to omit infant deaths, so that we do not get a complete record. It 
seems probable that the deaths of minors are not represented in such records 
in anything like their true proportion and that the differences are greater than 
might be expected to arise from differences of physique and nurture due to 
class. Thus records of the Landed Gentry give 31 deaths per 1000 males under 
20 years* while actual experience shows 163 to 197 per 1000+. But in the 
records of the reigning families of Europe we get a practically complete record of 
all members and therefore from von Behr’s Genealogie der in Europa regierenden 
Fiirstenhdusert{, I have extracted particulars of the age at death of over 2000 
individuals—all belonging to the 18th century. There was here no selection— 
every child was entered and every family had been traced from the birth of the 
parents till the death of the last survivor. 


Now in Dr Mott’s data we have already seen that cases in which the age at 
onset of insanity in parent and child is contemporaneous are most likely to be 
recorded. We can test the effect of a selection of this kind by investigating the 
effect of selecting, from our data regarding the age at death among those royal 
families, only those individuals who died within a certain number of years of their 
father’s death, and the results are given below in Table VI, p. 370. 


When we deal with the whole of the data, absolutely unselected, every family 
being complete and traced to the death of the last surviving member, we find that 
680 out of 1829 or 37:2°/, died under 20 years of age. Let us now apply a very 
slight selection to the data and reject the 92 cases in which the interval between 
the deaths of father and child was at least 60 years. We find now that 680 out of 
the remaining 1737 died under 20 years of age—or 39:1 °/,. Thus the effect 
of a selection of this kind is to cause a slight increase in the proportion of deaths 
at the early ages. If we make the selection slightly more stringent, by taking only 
those who died within 40 years of their father’s death, the percentage of individuals 
dying under 20 years of age rises to 46°7 and if we go still further and consider 


* See Pearson: Proc. R. S. Vol. 65, p. 291. + Statistics of Families, p. 73. 
Pp ? p 
+ Tauchnitz, Leipzig, 1870. 
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TABLE VI. 


Illustrating the Effect of Selection of Material on the Distribution 
of Age at Death. 


(Reigning Houses in Europe—18th Century.) 


Children who died: 


All Cases ie 
Unselected within 60 years | within 40 years | within 20 years in their 
Age at Data of their of their of their father’s 
Death father’s death father’s death | father’s death lifetime 
Numbers} °/, |Numbers| °/, |Numbers] °/, |Numbers| °/, |Numbers| °/, 
Under 20 680 | 37°2 680 39°1 680 | 46°7 680 | 6274 648 82°7 
20—389 277 15:1 277 15°9 277 19-0 254 | 23°3 121 15°4 
40—59 336 18°4 336 | 19°3 274 | 18°8 127 Igoe 15 1:9 
60—79 450 | 24°6 395 | 22:7 | 214 14:7 29 2°7 — — 
80and over 86 4°7 49 2°8 | 10 7 — — = = 
Totals 1829 | — 1737 | — 1455 — 1090 - 784 = 
Average | 
Age at 35°9 33°7 26°9 16:2 eu 
Death* 


only those who died in their father’s lifetime, then the percentage rises to 82°7 °/,. 
Looking at the matter in another way we find that the average age at death has 
fallen from 35°9 years to 7°7 years. 


The same facts are given in Fig. 6, which shows that as the selection of cases 
becomes more stringent, there is a regular increase in the proportion of deaths at 
the younger ages. In exactly the same way, the fact that cases where the insanity 
of parent and child is contemporaneous are the most likely to appear in Dr Mott’s 
records causes a spurious exaggeration of the cases of insanity at early ages in the 
younger generation and consequently a spurious exaggeration of the number of 
cases of imbecility and adolescent insanity. 

We can also investigate directly the question of anticipation or antedating 
on this material. In order to avoid the heavy weighting of large families which 
would arise if every child were entered, I have taken only one child from each 
family. Let us consider first of all the distribution of age at death of Fathers 
and their First-born Children. The facts are given in Table VII. 


We have altogether 294 cases in which we know the age at death of a father 
and his first-born child. None of the fathers died before 20 but of the children 


* These averages were calculated, not from the five age groups given above, but from the same 
material classified in 15 age groups. 
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106 out of 294 or 36:1°/, died before 20. The average age at death among the 
fathers is 61 years, but among the children it is only 36 years, so that there is an 
anticipation of 25 years. To borrow Dr Mott’s words, the figures clearly show 
the signal tendency among the offspring to die at a much earlier age than 
their parents; that is to say, anticipation or antedating is the rule. 


Age at Death. 
O= 205 40- 60- 80- 


A OO 
== 40 
as 


20 


Children who 


2 group 


died within 


40 years of their 


vn €AC 


Fathers’ death 


Percentages dying 
h age 


anu ee on 
i ee aS 
died within : * 60 
20 years of their = 40 ae aa 
2 ey 
Fathers’ death &.= 20 oy 


E 
TTT TT 
EET EEE 


S$ 80 a ee 
Fs 
er ge 60 
~o~ 
pba Ae SES 
Fathers’ lifetime $§§ a Soe 
LC a 
QS . — 


Age at Death. 


Fig. 6. Diagram to illustrate the Effect of Selection of Material upon the Distribution of 
Age at Death. (Reigning Houses in Europe, 18th Century.) 


Now in this material there is no selection of families. Every family was taken 
and the age at death of every first-born is known, so that we are only left with the 
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TABLE VII. 
Showing Anticipation in Age at Death. A. Fathers and Children. 
(Reigning Houses in Europe—18th Century.) 


é __ | First-born First Sons who 
Age at Death Fathers Children | Fathers Hadeckularen 
0— 9 — 95 s oo. 
10—19 — 11 os as 
20—29 6 21 4 8 
30—89 16 18 8 15 
40—49 45 31 31 39 
50—59 70 34 54 39 
60—69 tid 37 58 44 
70—79 62 33 46 54 
S80—89 18 14 12 13 
90 and over _ — — 1 
Totals 294 294 213 213 
Percentage dying under 20 0 36:1 0 0 
Average Age at Death ... 61 36 60 59 
| Anticipation de es 25 1 


third of Dr Mott’s fallacies, in that no allowance has been made for the fact that 
the parental group is limited to ages over 20 while more than a third of the off- 
spring die under 20. The effect of this selection can be removed almost entirely 
by taking instead of the first-born child, the first son who married and had at least 
one child. There are in all 213 such cases and we see that there is now no 
anticipation. The difference between the average ages at death is less than a year 
and by removing the artificial selection we have got rid of all anticipation or 
antedating. 


These facts are also shown graphically in Figs. 7 and 8. The horizontal scale 
gives the age at death in 10-year groups while the vertical scale gives the actual 
numbers of parents and offspring dying in each age group. The diagram on the 
left shows marked anticipation, and should be compared with Dr Mott’s diagram 
(Fig. 1) in which the ages at onset of insanity of father and child are compared. 
When, however, we get rid of the selection of cases by taking only sons who have 
had children, then there is no anticipation. 

If we compare the distributions of age at death in mothers and children we get 
exactly the same results. The facts are shown in Table VIII. 


We see that the first-born children died on an average 18 years before their 
mothers, but when we compare the age at death of mothers and the first son in 


REIGNING HOUSES IN EUROPE — I8™ CENTURY. 
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TABLE VIII. 
Showing Anticipution in Age at Death. B. Mothers and Children. 


(Reigning Houses in Europe—18th Century.) 


First Sons to 


| First-born 


| 
| 
, . | 
| Age at Death Mothers | “Ghijdren’ | Mothers | pave Children 
0— 9 —- 122 — = 
10—19 2 13 hers — 
20—29 47 26 21 8 
380—89 | 49 22 30 16 
4O—49 | 43 32 A) 41 
| 00—o9. |aeom 35 39 | 40 
60—69 | 80 42 52 46 
70—79 leer O2. an anes 4] 54 
SO—89 Po eS ear Wa 10 14 
90 and over | Die eal oH 1 1 
| a | 
| Totals | BE) Sar 220 220 
Percentage dying under 20 6 39°1 5 0) 
Average Age at Death ... | 53 35 55 59 
Anticipation Ss aoe 18 —4 


each case to have children, then the sons live four years longer than their mothers. 
It would have been better in this case to have compared the mothers with the 
first daughters to have children but unfortunately von Behr gives very little 
information regarding the female lives, except in special cases. The figures show 
a marked anticipation in age at death when we directly compare, as Dr Mott 
has done, mother and child, but this vanishes when we remove the arbitrary 
selection. The same facts are shown graphically in Figs. 9 and 10. 


If we combine these figures we can compare the age at death of parent and 
child and the results are shown graphically in Figs. 11 and 12. 


Fig. 11 shows that Dr Mott’s limitation of one of the two generations he is 
comparing to adults, without imposing a similar limitation on the other generation, 
introduces an artificial and spurious anticipation. The average age at death of 
the parents is 56 years and of their first-born children only 35 years—so that we 
get an anticipation of 21 years. If, however, we make the two generations 
almost directly comparable by dealing only with sons who have children—there is 
no significant difference between the two averages (58 against 59 years). 


In these cases we have dealt only with completed families and have taken 
every family without selection. If, however, we consider only the cases in which 
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the eldest child died in his father’s lifetime the amount of anticipation is greatly 


increased. The facts are shown in Table IX and in Fig. 13. 


REIGNING HOUSES IN EUROPE — I8™ CENTURY. 


ACE AT DEATH OF FATHERS & OF FIRST BORN CHILDREN. 


WHO DIED IN THEIR FATHERS’ LIFETIME, 


AVERACE ACE AT DEATH OF:- FATHERS 
FATHERS 62 
CHILDREN 10 CHILDREN 


ANTICIPATION §2 


100 


FREQUENCY 


We see here that among the fathers none died under 30 while 87°/, of their 
children died under 30; the average age at death among the fathers was 62-- 


among the children only 10, showing an anticipation of 52 years. 
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TABLE IX. 


Showing Anticipation in Age at Death. C. Fathers and First-born 
Children who died in. their Fathers’ Lifetime. 
(Reigning Houses of Europe: 18th Century.) 


First-born Children 


Age at Death | Fathers | dying in their 
| Fathers’ lifetime 
| 
0-9 | et | 89 
LO==19 | — 11 
20—29 — 16 
380—89 | 6 10 
4O—49 | 20, 5 
50—59 | 32 2 
60—69 3D — 
O79) 32 = 
80—89 10 | a= 
Totals 133 133 | 
| 
Percentage dying under 20 | O a) 
Average Age at Death ... 62 10 
Anticipation oe aae 52 


It is now possible to illustrate the effect of the principal fallacies which vitiate 
Dr Mott’s conclusions. In the first place he has dealt with families which are 
largely incomplete and has collected his material in such a way that cases in 


which the insanity of parent and child is contemporaneous are the most likely to 
be recorded ; in the second place he has directly compared parent and child with- 
out allowing for the fact that practically no parent can become insane before 20, 
while there is no limitation of this kind among the offspring of these insane 
parents. 


In Table IX and Fig. 18 we see the effect of dealing with incomplete families 
in which the children died in their fathers’ lifetime. There we get an anticipa- 
tion of 52 years. If we get rid of the first and second fallacies involving a 
selection of cases by dealing with every family, as shown in Table VII and Fig. 7, 
the anticipation falls to 26 years. If we get rid of the third source of fallacy 
also, by comparing the fathers with the first sons who have children, asin Table VII 
and Fig. 8, then the anticipation falls to less than a year. The Law of Antici- 
pation or Antedating has thus in Dr Mott’s case no foundation, in fact it is a 
spurious result of the mode of collecting and interpreting data. 

Now Dr Mott has not only asserted that this “ Law” applies to insanity but 
has also drawn the conclusion that the offspring of insane parents if still normal 
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at the age of 25 may safely marry. In an address delivered before the First 
International Eugenics Congress*, he said: “ You will observe that 47°83 °/, of the 
500 offspring had their first attack (of insanity) at or before the age of 25 years 
and as you see in the curves of parents and offspring, the lability of the child of 
an insane parent becoming insane tends rapidly to fall. Now besides the fact 
that this shows Nature’s method of eliminating unsound elements of a stock, 
it has another important bearing, for it shows that after twenty-five there is a 
greatly decreasing lability of the offspring of insane parents to become insane 
and therefore in the question of advising marriage of the offspring of an insane 
parent this is of great importance. Sir George Savage recently said that this 
statistical proof [sic !] of mine entirely accorded with his own experiences, and that 
if an individual who had such an hereditary history had passed twenty-five and 
never previously shown any signs (of insanity) he would probably be free and he 
would offer no objection to marriage.” 


Now I entirely fail to understand how anyone could recommend marriage in 
such cases, even on Dr Mott’s own figures; for if it be true that 48 °/, become 
insane before 25, it must be equally true that 52°/, become insane after that age 
and this very important point seems to have been forgotten. These figures, 
however, are taken from Dr Mott’s selected data, selected.in such a way that the 
early cases are enormously exaggerated. Until Dr Mott publishes a series of 
complete pedigrees, it will be safer to assume that the age at onset of insanity 
among the offspring of insane parents does not differ widely from that of all 
admissions to Asylums and there we find that only 21°/, become insane before 25, 
and 79°/, after 25. 


But surely at a Eugenics Congress of all places some thought might have been 
given to the mental condition of the children resulting from such matings, before 
advising marriage. It would not have been difficult for Dr Mott to have extracted 
all the available cases of this kind from his collection of pedigrees, i.e. all cases in 
which an individual had an insane parent and was normal at the age of 25, and so 
have discovered the probable fate of the offspring from such matings. 


Unfortunately the details given by Dr Mott regarding his pedigrees are usually 
so scanty that little use of them can be made, but two at least show the danger of 
the matings Sir George Savage and he sanction; these two pedigrees were given 
by Dr Mott in his lecture on Heredity in Relation to Insanity, delivered to the 
members of the London County Council. The first is shown in Fig. 14. (It 
appeared as Fig. 11, p. 18 of Dr Mott’s lecture.) In the first generation a man 
who became insane at 70 had four children. The eldest, a girl, became insane at 
68 and was therefore normal long after the age of 25. Dr Mott does not state 
whether the marriage of this woman preceded or followed the onset of insanity in 
her father, but even if her father had become insane before her marriage, Dr Mott 


* Problems in Eugenics, p. 425. This is one of many illustrations of the evil done by that Congress ; 
attention was directed and much weight given to hasty statements and ill-digested material. 
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would have raised no objection to the marriage since the woman herself was not 
insane. There were in all six children from this marriage of which Dr Mott would 
have approved. Two became insane, three were blind and five are said to have 


been paupers. 


= Insane P= Pauper B= Blind 


Fig. 14. 
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The eldest child remained normal till the age of 34 and although both his 
parents became insane Dr Mott apparently would not have objected to his marrying. 
He did so and one of his children became insane and eight out of nine are said to 
be paupers. These nine children are apparently still young so that their ultimate 
fate is still uncertain. 


The second pedigree I shall quote was given as Fig. 28, p. 33 of Dr Mott’s 
lecture, and appears here as Fig. 15. 


A man who had an insane father and an insane grandfather became insane at 
the age of 55. He was therefore normal at the age of 25* and Dr Mott would 
have sanctioned marriage in his case. He actually married twice. His first wife 
was tuberculous but not insane; they had two children, both insane. His second 
wife was normal and it is definitely stated that there was no insanity in her 
family; they had five children and one of these became insane. Yet Dr Mott 


would permit the children of insane parents to marry if only they are normal 
at the age of 25! 


Again, Dr Mott has stated that it is useless to attempt to limit the fertility of 
the insane since most of their children are born before the onset of insanity, 
and therefore before any action can be taken. From his statistics of relatives in 
L.C.C. Asylums, Dr Mott has calculated the proportion of offspring who were born 
after the first attack of insanity in the parent and found that “Forty-six offspring 
out of 581 were born after the first attack of insanity in the parent, 1e., 7°9°/,. 
That is to say in the case of 529 insane parents, the birth of only one-twelfth of 
their 581 insane children would have been prevented by sterilisation or life segregation 
of the parent after the first attack of insanity. These figures refer to the offspring 
which become insane, but there are a large number of offspring which do not become 
insane and these would be cut off if life segregation or sterilisation were adopted +.” 


But here again Dr Mott is using the data obtained from his index of relatives 
which shows a greatly exaggerated number of cases at the earlier ages among the 
offspring, and he thus greatly exaggerates the number of cases in which the children 
were born before the onset of insanity. No conclusion can be drawn from any but 
complete records of families. But apart altogether from this, many of these parents 
are themselves the children of the insane and much could be done to discourage 
such marriages. Unfortunately as we have seen Dr Mott directly sanctions 
marriage to those who remain normal till the age of 25. 


In further support of his view Dr Mott has stated that out of 642 females 
admitted to three London County Asylums in 1911, 148 were recurrent cases and 
of these 32 (21°/,) had children between their respective dates of admission. 
“The inference that can be drawn,” he says, “is that about one-fifth of the 
recurrent cases, or approximately one-twentieth of the female admissions have 


* If the term ‘‘age at onset”? has any real meaning. 
+ The italics are Dr Mott’s. 
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children after their first attack of insanity and of 31 such cases examined, 73 chil- 
dren were born after the first attack of insanity in the parent.” 


But have these 148 recurrent cases been followed up to the end of the repro- 
ductive period? Not at all. No ages are given and the cases are merely those 
which were admitted to Asylums in 1911, Dr Mott’s remarks being made in 
June 1912, so that no attempt has been made to follow them up. There is no 
justification for Dr Mott’s advice. 


There are many other points in Dr Mott’s work which deserve detailed exami- 
nation, but time will not permit more than a brief account of a few of them. 


It should be noted, for instance, that Dr Mott has used his index of relatives 
in London County Asylums as an argument in favour of the importance of the 
inheritance factor in insanity. His argument is as follows: 


“At the present time in the London County Asylums there are 725 individuals 
so closely related as parents and offspring, brothers and sisters. A priori, this, to 
my mind, is striking proof of the importance of heredity in relation to insanity, 
for we cannot suppose that 20,000 of the 44 millions of people in London brought 
together from some random cause would show such a large number closely related 
eis i0 9) oc 

But Dr Mott has not attempted to give, and I doubt if he ever will be able to 
give, a satisfactory estimate of the number of relatives in even a random sample 
of the population, and the population of asylums is far from being a random 
sample of the general population—there is for instance an extraordinary divergence 
inage. Yet without definite information on this point it would be impossible to 
say whether insanity is inherited or not—that is if we had to depend solely on 
Dr Mott’s data, 


It should also be noted that in these cases Dr Mott has clubbed together 
every form of insanity, from congenital idiocy to senile dementia, except of course 
cases due to specific infections or trauma. I myself think that course is the only 
possible one. To anyone who has studied even a few pedigrees of mental defect, 
nothing is more striking than the extraordinary number of different forms of 
mental defect that may appear in the same family. 


Seven years ago, in a First Study of the Statistics of Insanity and of the 
Inheritance of the Insane Diathesis*, I was confronted with the same problem, 
and after a full consideration of all the available data and of the opinions of those 
medical men who were best qualified to express an opinion came to the conclusion 
that the only possible course was to group all forms of insanity together, with, 
of course, the exceptions I have already indicated. The whole question was dis- 
cussed very fully in my paper and it was there suggested that an even broader 
classification might be of service. This point of view met with some criticism at 
the time but nothing has occurred to alter it, and the study of the inheritance of 


* Galton Memoirs, No. II. (Dulau and Co.) 
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insanity in general or of an even broader degeneracy must always remain the 
first object of our studies. 


Any investigation of the inheritance of special types of insanity or degeneracy 
can only be carried out however on unselected material—on the records of com- 
plete families. The type of insanity is so closely related to the age of onset that 
any tendency to exaggerate the number of early cases, as in Dr Mott’s material, 
will entirely vitiate the conclusions drawn. Thus Dr Schuster’s conclusions as to 
the inheritance of special types of insanity based upon Dr Mott’s data* must also 
be rejected on the above grounds. 


Dr Mott’s index of relatives in London County Asylums is unfortunately of 
very little value in the study of inheritance in insanity. Progress can only come 
from the study of complete pedigrees in which every member of the family is 
entered, whether insane or normal, and the ages of the normal at the time the 
record was made are just as important as the age at onset of insanity in the insane 
members, for a statement that a young man of 20 has not been insane is of a 
very different degree of importance from the statement that a man of 70 has 
not been insane. 


In the papers I have cited the children of the insane if normal at 25 are 
advised to marry, and it is asserted that it is useless to attempt to discourage the 
reproduction of the insane since most of their children are born before the onset of 
insanity, and that we should rely on the Law of Anticipation to end or mend 
degenerate stocks. 


I have shown, I think, that the Law of Anticipation as applied to the insane 
has no foundation in the facts provided and that the advice given as to the marriage 
of the insane and of their normal offspring is fundamentally unsound and directly 
eacogenic. Much yet remains to be learnt regarding the inheritance of the insane 
diathesis, but no one who has studied the family histories of the insane can doubt 
that in ivheritance we have by far the most important element in the production 
of insanity, and in view of all the facts it is the obvious duty of the Kugenist to 
discourage, rather than to encourage, procreation by the insane and even by those 
of their offspring who appear to be normal. 


* Report on the Statistical Investigation of Relative Cards, 21st Annual Report of the London 
County Council Asylums Committee (1910), p. 95. 
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ON THE PROBABLE ERROR OF THE BI-SERIAL 
EXPRESSION FOR THE CORRELATION COEFFICIENT. 


By H. E. SOPER, M.A. Biometric Laboratory, University of London. 


In a recent paper* Professor Pearson shows that where one character is in 
multiple graded grouping and the other in alternative categories, greater or less 
than a given magnitude, the correlation coefficient admits of simple expression ; 
the assumptions being that the unmeasured character, B, has a normal distribution 
and that the measured character, A, has linear regression upon B. Under these 
conditions the data required are the numerical ratio of the alternative groups, 
the standard deviation of the measured character and the deviation from the 
general mean of this character of the mean of one of the groups. 


This expression is subject to greater fluctuations of value in samples of NV 
of the population than is the product moment form, especially where one of the 
groups is relatively small; and it is proposed to find formulae for the mean and 
second moment of the errors from this mean to a first approximation, that is to 
terms in 1/N. These will appear in terms of the correlation coefficient, r, of the 
original population (which will be supposed normally correlated) and the fractional 
frequency, f, in that population of the group possessing the Bree or positive 
intensity of the character put into two classes. 

Let y be the graded character and # the alternative character the intenser 
value of which is possessed by the fraction f of the population. Let %, y be the 
general means and o,, o, the standard deviations of « and y. Then it is shown 
in the paper that if 2’, 7 are the means of the group /, 

pelt ee (1), 
cy La 
on the assumption that the regression of y upon « is linear; and that if 2 be 
normally distributed this is equivalent to 
ona x Se ee HONORE AOR ON NEDSS SC > (2), 
Cy Z 


* Biometrika, Vol. vu. (1909), p. 96: ‘‘On a New Method of Determining Correlation between 
a Measured Character 4, and a Character B, of which only the Percentage of Cases wherein B 
exceeds (or falls short of) a given intensity is recorded for each grade of 4.” 
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where z is the ordinate of the normal curve cutting off the area 
f=4(1—a) or 4(1 42) 
as defined in Sheppard’s Tables of the Probability Integral. 
Now if Do, Pi; Pa, ete. 


be the moment coefficients of the whole population with respect to the character 


y defined by 
0S (Mats) Nett cess cat mener rience ersta es (3), 


[see bi-serial table below which is here to be looked upon as representing the 


general population] and 
Dy; Dia! Pz ete. 


are moment coefficients of the group n’(=/N) defined by 


(i 2 1S) (COOP I BL geese en erm gona races anesoo00 (4), 
we may write f= pp, fy’ =pi, J = Pr. Ty = V(po— pr) and 
mye oe, Ay at Ms 
aE scl AOE he) De 2 aR oe (5). 
V(p2— pr") 2 
—+ Grades of y in Bi-serial Table. 
[tea ES 1 ea ae, ae 
V ” ” ” 
< Ny Nye 3 
on | 
° 
2 ny Ns! Ns 
as} 
S 
5 | 
oO 
Ny Ny Ns 


n'/N=f 

In samples of WV the frequencies n,, n,, ns’ and consequently the moment 
coefficients p and p’ and the ordinate z are subject to fluctuation and the values 
of the correlation coefficient calculated from this formula will have a distribution 
of errors. Let 7 be the mean value in such samples, ér the deviation from this 
mean value in any sample when 6dpp, dp,, dp), ete. dz are the deviations in moments 


and ordinate. Then 
= 3 Pi + Spy’ — (po + 8p) (pi + 8p) 


7 +6) : oe geet te hess Cormeen (3): 
/ {po + Sp. — (p, + 6p,)"} x (2 + 62) SP 
To express 6z in terms of deviations’ of the moments we have 
Ls, 
Ufa a (Fee ae 8 bh 
V Qar 8) 
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Hence to second order terms in powers and products of deviations 
1 1 2 
240s = ——— en eee 
N Qar 
=z{l—aéa—}(1—a’) (Sa)} 


at+éa 1 AD 
Sie eee 


a /Qar 
sa] m 
= —— Sa(@e ay 
0 J 2a 5 
Cy 


=2["(1-ag-40 -@) B+...J de 


=z |da—4a(da)}. 
It follows that . 


etb2e=2+a8f—5 (8f) 


,_ i ine 
=z+adp — oF (Ope )8 kei beeeeeee (10). 


At the same time that this value is put in (7) we may simplify the expression 
and subsequent algebra by supposing the graded character y to be measured 
from its universal mean value as origin and in terms of its standard deviation as 
unit of measurement in which case 

n=0; pr=of=1, and by () py 2) ieee ees (11), 
and since p, =f (7) becomes, 
mM zr + op,’ — fdp, — Spy Sp, 


Expanding to second order of deviations we find, 


P+ OF HT 64 Oy disaw- ccnp ease oven eeeeeeeee (13), 


where 6,, 6, are the first order and second order expressions 
1 ; ' 
o= a {dp,' — fdp, — zr dp, — ar 8p,'}, 
fa 


1 : 
n= : {har8p, dpy — - Spy Spo + = 5p, dpy — $6p' Sp, + 4 fOp, Sp. 
— Spy dp + $2r (Op,)? + S er (dp)? + (1 + 2a?) _ (5p. Seek (14). 
Taking mean values 
T= MEAN Og cooler eae eaten (GUS), 
mean 6, being zero since by (3), (4) 
mean dp, =S (mean dnsys”)/N =, 
mean 6p,’ = S (mean 6n,’¥;”)/N = 0. 
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Mean 6, is to be evaluated by the formulae 
mean 67,,5p) = (Putv — PuPo)/ N 
mean Op, dpy = (puso — Pu Pu )/N | 
mean 6p. 6p. =(p'utv — Pups )/N 
of piich the first two are well known* and the third may be proved thus: 
N8Spu = S (Sngye) = 8 (Eng ys”) + 8 (bn ys"), 
N Sp, =S8 (6ns' ys”), 
*, N?Spudpy =S8 {(Sns )? yst*} + S {Ong Sng’ ys yy} + S [Ons Sng Ys’ Ys}, 
where in the third sum s’ may or may not equal s. 
But mean (6n;)? =n; (1 — n/N), 
mean (6n; dry’) =— ns ns’ /N, 
mean (dn; dny”) = — ns'ng/N, 
the last whether s=s’ or not. Thus we findt 
mean dpudpr = (puso — Pu Py — pu" Pv) | N = (p'usrv— pupo)/N. 


Evaluating mean 6, es these formulae we find, 
7 Saye 73 tt ar (p2’ — P2Po ‘) SES = (py — Pi Po) 


A (pr — Pipo) — & (ps — py Po) + of (Ps — Pips) 


; ; as ; ee 7 
—(pr — Po Pr) + $27 (ps — pr’) + 3 2r (pa po!) +1 + 2a?) 5, (Pr — Po Y ena let); 


in which the undashed moments, being those of a normal curve with unit standard 
deviation, about its mean, have the values 
Pi=9, Pr=l, ps=0, pr=3, 
and the dashed moments beyond the first two, 
Sy i eae 
have values depending upon the nature of the frequency distribution of y and z. 
Assuming «, y normally distributed t 


? =|, depareat ay YT dandy, 


ae 1 : 
pi = i | tice TOA og diay 
a -2 Tv Se 


* See Biometrika, Vol. 11. (1903), p. 275: ‘‘On the Probable Errors of Frequency Constants.” 
K. Pearson. The second follows in exactly the same manner as the first, since the constancy of the 
total frequency dealt with is only involved, in deducing the relations (i) and (ii) of p. 274, of that 
paper. 

+ Dae =8 (n,/y.")/N. 

+ Since the moments appear in the term containing 1/N any errors in their calculation due to 
incorrect assumption of normality will not affect the present approximate formulae provided such errors 
are of the order 1/N. 
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Putting y=rex +7 and integrating with respect to 7 for constant 2, 


ie l Ly? 
pi=| (ray +(1 ae r)} ——— @ 2% da 


V Qe 


ps = {(rx)3 3 (rv) (1 Re r2)} oe Pat ce ia 


Sat A eeett alta a) | ee ON ANET Oo ovctc caaiSookdde cane (19). 


When these values are put in (17), and terms collected, the mean value of the 
bi-serial correlation coefficient in samples of V is found to be 
Mi : eee 


a i ae fs cenaaee i: | Pees. (20), 


where f =1—f= WN. ‘ 


In the work of obtaining this approximation all powers and products of 
deviations above the second order have been neglected. The means of such 
terms in samples of NV involve second and higher powers of 1/N* and the present 
result is correct to the first approximation. 


Again squaring (13) and taking mean values and subtracting the square of 
(15) we find to the same approximation as before 


G,7 = mean (or) —imeanyo,” seen ene ene eee (21). 


The evaluation of mean 6, being carried out precisely mm the same way as 
mean 6,, the result is the second moment of deviations of the bi-serial corre- 
lation coefficient in samples of J, 


ofa (e- i s (1 = (1 +f)! ror| ieee (22). 


Writing the two results (20) and (22) 
r=T {1 + (be 4P 1) 


of =a Niyee Wir Erk ee (23), 


the values of 1, Yat and y, for values of $(1—a) [=the smaller of n/N, n/N] 
from ‘50 to ‘01 are to be found in table (24). 


* See Biometrika, Vol. 1x. (1913), pp. 97—99. 
+ xa for $(1+a) was tabled in Biometrika, Vol. 1x. p. 27, and the table is reproduced in Tables for 
Statisticians and Biometricians, p. 35, Cambridge University Press. 
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$(1-a) fa Xa" va 3 (1—a) fa Xa” a 
50 0354 | 1:5708 | 2:5000 20 — ‘0871 2:0414 | 2°8578 
“49 ‘0353 15711 | 2°5003 ‘19 — -1001 | 92-0898 | 2°8951 
“48 0350 | 1°5722 | 2°5011 ‘18 — 1146 | 2°1437 | 92-9364 
47 0346 | 15741 | 92-5024 “17 — ‘1308 | 2:2035 | 2°9825 
“46 0339 | 1:5766 | 2°5043 16 — 1490 | 2:2703 | 3-0341 
“4S 0331 1°5799 | 2°5068 15 — 1696 | 2:3453 | 3-0923 
“hh ‘0321 15839 | 25098 Ly = 1931 24303 | 3°1582 
“43 0309 | 1°5886 | 2°5134 13 — +2201 25272 | 3°2337 
“42 0295 | 1°5943 | 2°5177 ‘12 —~ -2513 | 2°6389 | 3°3208 
“41 0279 | 16007 | 275225 ‘11 — 2880 | 2°7687 | 34224 
“40 0260 | 1:6079 | 2:5279 10 — +3317 | 29221 | 3°5427 
39 0240 | 16161 | 2°5341 095 | — -3568 | 3-:0095 | 3°6115 
38 0217 | 1°6251 | 2°5409 090 | — +3844 | 3:1057 | 3°6873 
3 ‘0191 1°6351 | 2°5484 085 | — 4153 | 32119 | 3°7713 
36 0163 | 1°6461 2°5568 080 | — +4499 | 3°3299 | 3°8648 
85 0132 | 1°6582 | 25659 075 | — -4887 | 34620 | 3-9696 
‘Bh 0098 | 1°6714 | 25759 070 | — -5324 | 3°6110 | 4:0879 
33 0062 | 1°6858 | 2:5868 065 | — -5822 | 3°7806 | 4:2294 
32 ‘0021 1:7015 | 2°5986 060 | — -6403 | 3°9748 | 4:3777 
‘81 |—-0023 | 1°7186 | 2°6115 055 | — °7083 | 4:2002 | 4°5584 
30 |—-0070 | 1:7371 | 2°6256 050 | — *7897 | 4:4652 | 4°7723 
‘29 + |—-0122 | 1°7573 | 2°6409 045 | — °8868 | 4:°7829 | 5-0283 
emt —-Ol79.) || 17791 2°6575 040 | —1:0053 | 51715 | 5:3410 
27 | —-0241 1°8028 | 26755 035 | —1:1577 | - 56568 | 57362 
26 |—-0308 | 1:8286 | 2-6952 030 | —1:3556 | 62859 | 6-2485 
25 |—-0382 | 1°8567 | 2°7166 025 | —1°6308 | 7°1347 | 69481 
‘24 |—-0462 | 1:8874 | 2-7399 020 | —2:0272 | 83600 | 7:9572 
23 =|—-0550 | 1°9208 | 2°7654 015 | ~2°5263 | 10:3024 | 9-4975 
22 | — 0647 19574 | 2°7933 010 | —3:8889 | 13°9393 | 126086 
21 |—-0753 | 1:9974 | 2-8240 

| 
ee (24) 


The bi-serial value of the correlation coefficient has the standard deviation 
Op = (Ke — Wart ry/ VN, 
whilst that of the product moment value is 
op =(L—)/VN 
In table (25) a comparison of the values of the numerator is made for five 


values of r, for divisions at 0, ‘5, ...2°5 times the standard deviation from the mean 
of the ungraded character. 


Values of ,/(xa2— par? +74) for $(1-a)= 
Values 
? of | 
(a “500 “309 “159 067 023 -006 
‘00 1:00 E25 1°31 1°51 1:93 2°76 4°5 
225) “9375 1:19 1°25 1°45 1°86 2°68 4:3 
“50 ‘750 1:00 1:06 1°26 1°65 Dea, 4°0 
wy) "4375 “69 ‘75 94 1°30 1:95 3°2 
1:00 ‘00 OF oo “49 ort 1:13 1°8 
Seahie (25) 
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Thus the effect of grouping and applying the bi-serial value of the correlation 
coefficient is to add 25°/, to the probable error in the most accordant case where 
r is zero and the division equal, whilst if r is as large as ‘5 and one group as small 
as 10°/, of the whole the probable error is nearly doubled. For higher values of 
r the errors of sampling, in the case of the product moment formula, grow smaller 
and ultimately vanish when 7=1,; but the bi-serial values are not invariable in 
samples drawn from a perfectly correlated population but possess a variability as 
high as *27/s/V in the most favourable case when the grouping is equal. 


If the standard deviation be calculated from the approximate formula, 


Graf [2 —7) IN wee (26), 


which may be written * = 
Or = (Na — 12) NO eee CAD 


the error of computation will not be great for values of f and r commonly met 
with as the following table compared with the last will show: 


Values of ya—-7? for $(l-a)= 
ip 

“500 309 "159 “067 ‘023 -006 

‘00 1°25 1°31 1°51 1:93 2°76 4:5 
"25 1-19 1:25 1°45 1°86 2°70 4:4 
“50 1:00 1:06 1°26 1°68 2°51 4°2 
‘75 69 wi) 95 1°36 2°20 3°9 
1:00 "25 31 51 93 1-76 3°5 

soo0n (28). 


The difference between the two expressions only reaches 5°/, when the 
smaller group is less than 7 °/, of the whole. 


It will not be necessary, excepting in small samples, to apply a correction 
to the bi-serial formula for r in virtue of the mean of samples differing from the 
population value. The correction is less than 1/Nth part of the value calculated 
unless one of the alternative classes is as small as 4°/, of the whole. 


I have to thank fellow members of the Staff for assistance in calculating the 
tables. 


* For a table of xa see Tables for Statisticians and Biometricians, p. 37. 


ON THE PARTIAL CORRELATION RATIO. 


PART I. THEORETICAL. 
By L. ISSERLIS, B.A. 


§ 1. The theory of non-linear regression in the case of two correlated variables 
is due to Prof. Karl Pearson*. He shows that regression ceases to be linear 
when the correlation ratio » differs sensibly from the correlation coefficient 7 and 
establishes criteria for parabolic, cubic and higher forms of regression. 


The present paper deals with the regression surface of three correlated variables 
x, y, 2, Where, though the regression of z on a, y cannot be adequately represented 
by an equation of type 


Ze 


= ETE ed (1), 


om ox oy 


the regression of z on w for a constant y and of z on y for a constant « is linear. 
A large proportion of the non-linear cases that occur in practice fall into this 
class. It will be remembered that Z,, in (1) denoting the mean of the array of z’s 
for a given w and y the coefficients y;, y;’ are partial regression coefficients and it 
will appear that just as it is necessary to introduce the correlation ratio 7 for an 
adequate description of non-linear regression of two variables, there must be 
introduced multiple or partial 7’s for the description of such regression in the 
case of more than two variables. 


We recall the definition and principal properties of .7,—the correlation ratio 
of y on 2, oq, being the square root mean weighted square standard deviations of 


the arrays of y: 
i S(Nzo"n,) “ SS {roy (y - In) 


(_—-7’)o,/°= Ory V Notes (2), 
Om,  S (nz Ya — Y)} 
Cee Umea 
7 ae Naeem yer an Tecrinas treats (3), 
and NGG ST) Cy Sua ne) ys wacnaweeteace seals (4). 


* Drapers’ Company Research Memoirs. Mathematical Contributions to the Theory of Evolution. 
XIV. ‘‘On the General Theory of Skew Correlation and Non-linear Regression.” 1905, 
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' Here we are dealing with WV pairs of two characters A and B. ng of these have 
the character w of A. Y, is the mean of this #-array of b's. o,,, is the standard 
deviation of this array, om, is the weighted standard deviation of the means of the 


arrays and Y is the value of y given by the regression straight line, Le. 


This is the “best fitting” straight line (in the Gaussian sense) to the means 
of the arrays, and is the regression line when the regression is linear. 


§ 2. Consider now three correlated characters A, B,C. If N combinations 
of A, B, Care taken, we may denote by n, the number of these which have the 
character A =a and by nz, the number in which A has the value w and B has 
the value y. Let %, 7, Z be the mean values of the total population, and let Zz, 
be the mean of z for a given w and y. The frequency of A=a, B=y, C=z 


1S alte 


We define the correlation ratio of zg on « and y, which may be denoted by 
vyll;, or if no confusion is likely to arise by H,, by the equation 


esl] Seen une = 20) > (6). 


The triple sum in the definition can be written 
SSS {Nayz (2 — Z+ 2 — Zry)?} 
= SS {Ngy (Z — Zry)?} + SS {(Z — Zay)} X S [Maye (2 — Z)} + SSS {Nays (2 — 2)} 
= SS {nzy (Z — Zzy)?} — 2SS {gy (Z — Zey)?} + No?. 


SYS Oe Zo 
= eee De G2 GSE a oe oe (7) 


This is a generalisation of the property of ,, given by (3). 


Hence pilates 


Further, the “best fitting” plane tou the means Z,, is given by 


2-2 L—2 592) 
== Wh Sf eyg OP ctahnsnetes See eee 8 
Gy a On ue Cy (8), 
Te Pyan 
where fg ven vealid tn ctde debe ne (9), 
ary 
> __Vyz— Vaz xy 
2 aa (a (10). 
Tay 


Let ,,R, denote as usual the maximum correlation of z with any linear function 
of « and y, then 


rayliter = 32a + Ys Tzy PIA MOOD CAS AHOC CI OO ORDO GODOOAt.00 (a): 


y2 2 > > > 
_ Pye Fen = WV yz x28 ay 


Ley 


u ™ 
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Subtract (11) from (7) after replacing 7, and rz, by the appropriate sums, and 
we obtain 


Laifet aS Ayla | N oy 


= SS ay (Z — Zn)? -—S | = (2—2Z)(a@—@) vs} == Jew ge (2-Z)(y-y) ve} 
a y 


Big ee ce Opie RNa 2s. Cs ye 2 PT Day 
= Sis ay (2 — Za)” — Nay = (Zxy — 2) (@ — @) Y3 — Nay 7 (Zay — 2)(Y— 9) Ys | . 
a y 


Using (8) this can be written 
[eyll? — my R?| No? = SS {nay (Zry — 2) (Zay — Z)} 
= SS {Nyy (Zry — Z)?} + SS {xy (Z— 2) (Zn — Z)} ...(13). 
But SS {ney (Z— Z) (Zxy — Z)} 


=SS | (1 


9 yp / * ane 12 
= No, (Ys02" zx — Ys’ F2 — 33 Vay Fz + V3 T2V yz — V3V3 Vay Fz — V3 oz) 


“LB 1¥—Y 
Oz+ 42 J 
Ox Cy 


Co See wey 
Oz i eta z= Cray 3 is Fa) 
x y 


PMG aval ine — Va is May) ict 8 ys — Ya — Yalay))  <seessoersccseeeenies (14). 


Using the values of y; and y,’ given by (9) and (10) we see immediately that 


Tea — V3 — Ys Tay =Tyz— Ys — ¥3V xy = 0. 
Hence (13) becomes 
(cyl Ey reg RENIN op ISI SI OPI C29) 1 ee (15). 


This is the generalisation of (4). We deduce that ,,H,=,,R, if and only if the 
regression is strictly linear, that otherwise ,,H,>,,R7 and by (6) that ,,H?< 1. 


§ 3. These properties and definitions can be extended to the case of m 


variables #,, 2, ... fm. We now use » mt, for the mean of a, when a, a3, ... &m 
are given and denote by S a summation extending to the variables a, a, ... wp. 
Le. 497 


If we define the correlation ratio of x, on a, #3, ... 2m by the equation 


UN Git mace eda) iS IC gel ali) Maan easiest lesesed (16). 


1...m 


We can deduce in the same way as in § 2, the relation 


AM ote mev ds Wioc=3 112) (Gp coe tn i Oe (17). 
2... 
In order to generalise (15) we recall that the “best fitting” linear function of 
the variables a, 3, ... 2m to the mean o5 mi, 18 
Oe ye ie oeay meee ong ew me (18), 


on O71 G2 Oo; om 
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where by, = a and Ryq 1s the minor with its proper sign of the element in the pth 
row and qth elamn of the determinant 
= ecg rime 
Fayre ec eee cere Pee: 


i ry 


lm Vme OOo Ih 


while the maximum correlation between any linear function of a, 73, ... ®, and 
TIS ast le, Were 


R-R 
—o3,..miy? = pe = bisa Ost is os ie Olin Tun ate oe (20), 
ll 
S {ny (@, — DB) (2, — %)} S {ns (a — Z,) (#3 — X,)} 
= by ce a Dis 6 7 
Noo, Noo; 
8 {7m (@ — &) (2m — Lm)} 
+... + Dig —— Neto = aa ee (21). 
Subtract (21) from (17), noting that n.= S {rom}, etc., we obtain* 


3...m 


(, 3, eek FS iOn3 30. plate) No? 


= = oO a i 
= 8 {ae (%— oan | =f 2 |r = Bis (a, — X;) (a, a x) 
P 2 


2... 


oO 3 ae 
apiece Ar S {ran --. Dine (ay ae 7.) (Gan ro Zn) 
Im om 


as SENG oO — ae =e 
== {No mM (&, = away +8 {n, ee Dis (a. = Lz) ae a x)| Tee 
9 C2 


2...m 


m ne 


oO pee = = 
+8 \" a bin (Ca ee Lm) ( m& fad a) 


= = Np o = 2 = 
= 8 |» (2, oe 2...) + Nem 4 Dy (2, aaa 2) (2. mA ae ;) 7 gsi | 


2... 2 


2... 2 


Oj —t 
= 8 tm (Gone 2) jaan — a+ Drs = (2 — ae) Tee 


m 


Ge ee 
a Dim es (Ge zm) eae eee (22), 


using (18) this equation becomes 
(s, 3, Ravel motores ana) No? > (Hen Cena at 2) (2, me — X1)} 


2... 


= 7/8) gen (om = DG) + 8 {fea an CG a X,) Goan aan X,)} 
2... 


2...m 


* By an extension of the notation described at the beginning of this section S3., denotes a 
summation with regard to the variables 73,14, ... 2,3 11,2,...m iS the frequency of a particular com- 
bination of the characters w,, x2, ... x,, While ny, is the frequency of the combination 2, 2. 
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But S {ts on (X, a ZL) Cann =e X,)} 


2. IN 


( = ee. geen 
Ly — Ly Xz — Vs er Ly — Ve 
=S ie (- Dig ——— 0, — dy O—---) (2. my —Mt+ Dis O,+... (24), 
o on o 


2...m ( 2 


and -Noyon%is= S {2m (@, —%;) (o—- ) 


1... 


=S (eee (a2 = Hy) Cant aa X,) 
2... 


with similar values for 7,3, 7.3, ete. 
.. the right hand side of (24) 
= = by? yp — Dy? oP — 043012047123 « -- 


oe Gy Osis bs0 or To3 — bis Oe Bis Dr Cie vaee 


= GOS (- Tro — Oya — Bis 05 — Oya t'n4g — « — Gina) 
+ o;7b,s (= 113 — Dis — Diy Ig — «- ) 
Goh Gam - ~.nens seen nee eIAeER SIR CIMA CE Ie eRe Eero (25). 


Each line in (25) is identically zero from the definition of the b’s and the properties 
of the determinant in (19). 
S {Ns mM @ m% a aa) 


2 2 2...m 
Hence payrelae = 2 ealagi => —— = WN Fe aii RIM 8s oie sks (ouslekele| oxsiejsie’s (26), 
Orie 


so that the fundamental properties proved by Professor Pearson in connection 
with the correlation ratio 7, hold for the generalised H defined in this section. 
In particular equation (26) shows that a necessary and sufficient condition for 
linear regression in multiple correlation of m variables is that 


O35 Pilla == pyr means 


For in this case the mean value of any array of 2, will lie on the “best fitting ” 
m-dimensional plane. 


§ 4. The regression surface of z on xy being assumed of any particular type 
the constants in the equation may be determined (i) by the method of least 
squares, i.c. by making the sum of the squares of the deviations 2,,—@(xy) a 
minimum, z= ¢(w#,y) being the regression surface, or (11) by giving such values 
to the constants that the correlation between z and (a, y) shall be a maximum. 
When ¢ (a, y) is of the second degree the two methods lead to identical equations 
for the determination of the coefficients. 


The same equations are also obtained if the surface be “fitted” to the means 
by the method of moments. There is, however,a distinction to be observed. The 
equation z= (a, y) when the regression surface is of specified degree contains a 
definite number of constants and the first two methods will give exactly as many 
independent equations as there are constants to be determined. The method of 
moments will give as many equations as we please if sufficiently high moments 
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are used, including of course the equations given by the “least squares” method 
or maximum correlation method. Even without introducing high moments, when 
there are three variable characters new equations may be obtained by the method 
of moments, by combinations of characters which do not arise in the other methods. 
The method of moments is most convenient for our purpose, but we shall only 
employ those equations which can also be justified by (say) the method of least 
squares. 

For convenience let the origin be taken at the mean of the three characters so 
that2—y—2— 0) Jet atytze denote 

SSS {Mayet y'2"} _ Pasty (27). 


a ee 
Noiiay Ge Ox Oy Cz 


With this notation 7,, and ga, are identical; when z does not appear in the 
product, it is sufficient to write @,s 4 

The most reasonable next approximation to make when a linear function 
(a, y) does not adequately represent the statistics is (a, y)=a quadratic 
function of a, y. 


a 2 2 
Let ELI rg OEE a (28). 


om Ox Gy Gxoy Oz Oy’ 


Multiply (28) by mz, sum for all values of w, y, z and divide by V 
OS + Cry HOA SF 200. -2. sess dea near eeeeee (29). 


’ : ales x De Z 
Multiply (28) in turn by “ae times —, 7, ae ae x and sum as before, 
ap IO i) Oa 


and we obtain 


Toy = 0 + OT yb CQnry Ap CGasit Pf nyte s0+efecioes seine nee enna (30), 
Pyz SO Olay + COmp + Oxy Hii Qys seceese osese nee Gnee eee eter (31), 
Qayz = Uxy + Aqary + OY aye + CYary + CYasy tI Yays veeceeceveseees (32), 
Gare = A+ Agast+ gary + CYary + CQat Hf Qary? vrecerececrscecscneas (33), 
Pye = AA Adaye + bq ys + COays + CYnry +S Qys -cescersensercereoees (34). 


Actual numerical fitting shows that in many cases e and f are small compared 
with c*. This is the case when the regression of z on « for a constant y and of 
z on y for a constant « is linear. We shall therefore confine ourselves in the 
present preliminary paper to the case where we may write 


2 gi BY ee (35). 


Oz Ox Cy OyGy 


Here for constant # or y the regression of z on y or of z on @ is linear. 


* Cf. Census of Scotland, 1911, Vol. 111. p. xuvi1. where Mr G. Rae obtains by moments the 
regression of fertility on age of husband and wife. Let W=age of wife, H=age of husband, C=number 
of children in completed marriages. He finds 

C yy = 20°149493 — 0°555812W — 0173804 — 0:002846 W2 — 0:003494.H?2 + 0:012675 WH. 


See also the paper by E. M. Elderton in the current part of this Journal, pp. 291—295. 
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Equations (29) to (32) become, when the regression surface is given by (35), 


ORICA CTir lia oe ME sia a i sremiebe cin ccs ara ie aheteers sie yee scsgiee (36), 
Me MAO OI <IaICQ Bayh SAAR acide cing Sammarn imum netics tee’ (37), 
fig, OS OMe Cis, a apeons chochaed ciaden Mace se aeeog sare (38), 
Ynys = ON ny + BGary + OQmya + CQ ary? — szerereveecsteseerens (39), 


= — OF xy + Gary + bgaye + Care by (36). 


Solving these equations we obtain 


a oe b ee 
Wey ae. day Paz | ie | ary 1 Voz 
1 Yay? Vyz Yay? Tay Tye | 
aye § Jary2— Pry — Jaye | | Gury? — Tay Ya2y — Yacye | 
Cc . 1 
MMe te ol. a Gy ceases (aw): 
Pry 1 Vyz Poy 1 Tay? 
Gary = Yay? aye Gary Yay? Yay? — Tray 


we have already denoted the partial regression coefficients by y; and y;' so that 


_ Paz Vy2V ay + Tye Vaz ay 
(i acm ae and y; eure rie (9) and (10). 
In addition let VEO al) A ae (41), 
1l—Pryy 
and BeNLaE ies DOI Df ter oh ik gihee ct es (42). 
Ll Pry, 
After some reductions the determinants in (40) yield 
CYNIC Umm eee Nett ee Aah MN otto ena te 1 hil ve (43), 
b = Ys + ch sLbiaisiisteteraip/siiaretehareta'atalstanets seve) a seksi) eaactve rege cera ia tis yay fins (44), 
Or z2 ar br yz F Yxyz 
oS : Far ASE ae ee ee 45), 
Bary + Pay + Qary — Tay ( 
also NC pa Sete Cte ee Me aa cag Giasiis Gusta ot (36). 
Note that Vs Gary shits Qe N= NO ag AID Tay) eeosscswsodiscest sree (46), 
and Ys Vza a Ys. Vey = pln: (cf. eqn. 11 Js 
oy: S LY =F Y i 
§5. By definition (1—-yzH?)= ue ee : Zmy)?} ; 
oy 


nin using (35) 
ql H, ) =5 ea (= === ee = ug. CLY 
ayttz 


N \o, Cia aye SCRAORY 
=1-2812 (d+ 44 art.) Pash 
CO; Oy Cy Oxy N 


++ 04+ 04 Cqgaye + 2dergy, + 2abray + 2acqry + 2beqzy, 


Biometrika x 51 
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= gy? = — Arey, — Wry, — WAayz+ C+ B+ + Cqary2' 
+ 2cdryy + 2abrey + 2acqary + WCqayr-..-+-+- (47), 
= W(— Pag t A+ Orgy + CQa2y) 
+b (= 1yz +0 + zy + Cay?) 
+ (= Gayz + Ar zy + gary + bqays + C422) 


+d (d+ cryy) 
— ON 24 — Diyz = COpye «2-2 ee ee a ..(48). 
The first four terms vanish by eqns. (36)...(39), 
"sagy ll? = Oz +: Ola) HCOnye ees ee ee (49). 


If we now insert the values of a, b, ¢ from (43)...(45) in (49) 
ay, = (3+ CO) Pon + (ys + ch) r zy + CY xyz 
= Ys" en + Ys Tye + 6 (Ore + Oye + Gayz) 
(Or ze + PV yz + Vaye) 
ary? + Qayeh + Garye — Mary’ 
(Gaye (1 = Pry) = Gay (ToeTay — Tyz) — Yaty (TyeTay — Tae)\ (30). 


2 ey? + Qa — 2x2 27 % 
| = Pig) i ay? + q a == y Vary | (1 — r2,y) 
ay 


== aylees ete 


or yl a 


It follows from (50) and (15) that 


2 a2 % = 2 2 Ta. 
a (51) 


Yox2y2 — Vary > 


If we eliminate q,,z (which is a triple moment troublesome to caleuiae) 
between equations (45) and (49) we have ! 
ay? = OP zy + Dr yz + CGayz 
= OP ag + Dr yz + C (Gary — May) + ACQary + beqaye 
= Tox (3+ CO) + Tey (ys + Ch) + C (Gay2 — Tay) 
+ (Y36 + 09) Yary + (Ys'¢ + CD) Qanye 
=O [Gaye — Tay + Oqary + Gaye] + ¢ [Ore + Pry + Ys Gary + 9 dey] 
+ Ys" ex + Ys Vey 7 
= yl? + C [Gare — Pry + Oqary + $Yxy2] by (46) and (11), 
— Pay t Fay — 2qery? Qty =. 


ay Ht? — ayl? at | dev — Ty 


tea 
He RZ : 
Hence C= $$ et ee (52). 
2 Paty + Paye — 24ary2QaruT ay 
Gary? — Pay — = Seeage 
me ay 


This value of c? is positive by (51). 
Equation (52) shows that ,,H7=,,R? is a necessary condition for linear 
regression, which we have already proved in equation (26). 
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The regression surface of z on a, y is, with the values we have now obtained 
for the constants 


BZ Vg — Tyel ny © © Vyz—Vax2V ay Y 


as = = te 

om 1l- Prey Ox Us rey ae 
ar we el 2 ale: : {= ay? = Yary & e + Vay Yary — Cay Y — y 

Ee ee EY et ee a = pee 

4 7 Payt + Paty — 2Qa2y Yay?? ay pales ae ae 7 

sys 1 ay Ye 
1 a Tey ( 
(w7—%)(y—-Y) 
+ — pS “ey piavetanallslasons (53) 
Tiley 


The terms in the first line give the ordinary regression plane. In most cases 
the regression does not differ widely from linearity so that ,,H?—,R? is small. 


§ 6. We must now get some idea of the relative magnitudes of qa, Jay, Ge2 
and moments of lower order. 


First with regard to g.., which is equal to Leis 
: On Oy 
aye S (Nyt? y) = S (Nz ix") (54) 
Vary No? oy No? Oy Si aheieje<e/e\ere:e. 6 cise: e-0c8 Blexeie reece . 
If the regression of y on w be linear 
(1 7y ory] 
Ox : = 
Vary = S = Vyy VB, 


No,20, 
so that q,2, is zero if the regression is linear and the frequency of a symmetrical. 
In fact 22 — 1m VB, = 0 is the same as Pearson’s criterion for linear regression given 


by €=0 (Skew Correlation and Non-linear Regression, p. 30, Eqn. (1xix)). 


We may obtain a good approximation for qx, by considering the regression of 
y on « to be parabolic. This is a natural assumption to make if the regression 
surfaces of w on z, y and y on z, be also of the hyperboloid type we are discussing 


for the regression of z on a, ¥. 


For with origin at mean we may write 


@, Zia, 
Bip eg IS de” 


Ox Cy Ox Oyoz 
Hence keeping y constant and summing for the 2’s 


Bh Ss, WEI hyey 


(on Oy Oe Gy Ozs 
Zz y V yne — 7 ye (Y == i 
But rye Pye" Yee Nymd Pye 2 _ y: ie a V By’ Shee) | x 
Ge © By — By —1 (ey Cy | 


* See Pearson: l.c. Eqn. (lxv) (where Y,, is a misprint for X,), and py”, 8; refer to the distribution 


of z. 


Px 
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“" is a quadratic expression in y if we remember that h being of order 
Ox 


Ci SS Wan ane oa peas 
NE ay is of the same order as Vyn2—7yz*. 
But the relation $2 (x? — 12y) — € =9 is satisfied when the regression of y on 


# is parabolic t. 


Here d,=8.—Pi—1, €=€— Tay VB, €= qayt. 
aes Qaty — Ty VB, = Vang — Try V Bo — By — 1 veteeee eee eee eens (55). 
Similarly Quy — Tay V BY = Vn =a V By By = 1... were (56). 


The use of these approximations will save the direct calculation of gq, and 
zy provided we can determine the signs to be attached to Vz,?—7'sy and 


Vine — 1x. This is often easily done by inspection of the regression curve whose 


a y: we — Tay | ae — & 
= a5 —* a- -———1;8 
oy : ox B.— By —1 lox" VB, ox 5 
We can approximate to qax,2 as follows||: 


SS (tay ay?) _ S {nea (o*y, + (Va — yh 


da? = = : 
avy Nox2o,2 NoZo,7 ) 


equation is 


where Y; is the value given by the regression straight line and oy, x nz the 
second moment of the array of y’s for a given x, about the point Vz. 


S {n, 2 (Y2—Y)? ; 
But . - i 2 _ 8 (matrtve, Vo 
a Oy 
= Saag . 7 
S (Ny, 2? a7 yz) 
Thus ee ae mae 
Gary? Nox? oye +P Be vy > 
Bhd S(n yo? ) 
Similar] gf a Zs 
: Me Yaty Nozo/ Be Pay; 


or, so far without approximation 


ai 9 _<¢ Y 5 ¢ 
us = S(nz,a?o7y,.) + S (ny yxy) 
i oe cy) 
Ae 2Naeo,7 


ap $ (By ote B,’) Preys 


* It is noteworthy that the hypothesis that regression of z on a, y although of 2nd degree is such 
that regression of z on x for a constant y is linear leads to the result that the total regression of z on x 
is parabolic. 

+ Pearson, l.c. p. 28, Eqn. (lxiii). 

{ Pearson, l.c. Eqns. (li), (xlv) and (xiii). 

§ Pearson, l.c. Eqn. (lxv). 

|| It can be found fairly directly by tabling to the squares of the variates, when we need a simple 
product moment. In a later part of this paper some comparisons of actual and approximate values for 
numerical cases will be found. 
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Now the mean values of oy, and o?y, are known to be 
Cy liter) -and,-o,° (L—73,,,), 


and the deviations from these are usually somewhat irregular. Rarely can we 
do anything better than assume them to vary with a slight linear variation from 
the mean. For example 

Oy, = Oy (1 —7°gy) (1 + Ax), 


where A is small. In such a case 
Were (1 — rg) (144.8) 


or to a fair degree of approximation*, we may put 
: Vx2y2 = 1 Tey ar 4 (B, ar (SEO) Pevy, 

and thus write 

Quy — May = 1+ %(B2+ Bs — 4) Pay 

= 1+ y+ $(8,— 3+ Bo — 8) roy. 
The latter part of this expression vanishes if the frequency of the # and y variates 
be mesokurtic. It can of course be retained if desired but its product with 
V wy? — ey RZ will usually be of the second order. If we write 
v=t(Bo- 3+ B, — 3), 


we find the approximate regression surface 


Lim aa PyzVay (@ — @) ns Pyz — VazV ay We Y) 
Oz 1- Tey Ox t= Tey Oy 
/ se — ah? (w— 2) (y-Yy) PG (57) 
Tee Was | os = ae : 
This equation (57) enables us to express approximately the multiple ,,H, in 
terms of the simple 7, 292, xMys yNx- 


+ 


§7. To obtain this connection between the multiple ,,H, and the simple 7’s 
we may proceed as follows : 


OF 


Zay =d f= ae 


, origin at mean, 
oz ox Gg y Oy 


Hence keeping y constant and summing for the «’s 


“Ud 4 a — +2 uf ce ae ee ee (58). 
z & y FnOy 
» _ SSS {ney (2) — 2) les SSS (ayz 27) «¢ 
ue Luar aon No? No? ie 
SSS {Nayz (Zey — 2)? — SSS {Nay2? 
pee Yy. y fend aye” oi 
ead ay He = No? BeUNag -~ 


* T.e. we are neglecting terms of the second order as AV Bi. 
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page ls — yn? = SSs ka (“ (wy) | cy (a — =») (2d+a @ tty | 4 by 
ae ox oe oy oy Cy 

| Gyan 

= 5 (22) (a+ 2) [2 (a4 on) (24%) 

= ss |" (FS Ox #) (a+) 2 d+) + (a2) Ox ) . 

by By 0, — 2,2 CY\? Nay) 

99 a ie ay y Cy Nay 

s {(a+ ale jo Ses Ox ta + as |e On? (a+) N} 


by CY\" | Nay LU — X,? 
=0+8 \(a+ 2) (s Neorg )t. 


Now S {Naya} = 8 {(w — dy + Ly)? Noy} = ny (on + 0+ 2,7), 
x we 


a : CY \7 Ox,? N, 
ay H? — yn =8 (a+ “f) at : 


On 


ns fo+ 70-9 


y 


= ee + 2et 0), 


Cy oO, 
ay? — ne = (1 = yn2) (EO)... eee (59). 
Similarly eye ee = (8 —27,))(0F aC) nee eee (60). 


Remembering that a= y;+c0, b=+¥;' +ch we get from (59) and (60) 


a ra) be ay? — Ne 
woe See ee : SF le ¥5%3 (sh — ys 9) 


1- Qa 
; tiGe {ys o— 39 — Ob (ya = Ys, 0)} AERC (61). 
From the values of ys, y;, 9, @ in (9), (10), (41), (42) we obtain easily 


yzYary — Vaz Vay? 
/ yz 4ary az Yay 
Yah — ys = "een Teeter 
Seay 
/ Vaz Q x2 oan Tye Jay? 
y3 6-730 = 
1 — 1 xy 


Ya  — 29 — Of (ys — ys 9) 
— PaeQary — TyeGiy? (Txy Yay? — Yary) (Cay Jary — Yay?) Cyz ary — Vaz Gay?) 
LP ey (1 — rey)? 


and c? is given by (52). Hence (61) may be written 


(Qe —Vxz Try) (Tey Vay? — Yury) (7 az — Ty2? ay) @ wy Yay? — Ya? y) 


x lai x R? = x = ~ 
ie es (1 = xy)? CL = ya") (1 — ray)? (Ll — amy’) 
(TxzGary = Tyzay ) ne (Vey ary? = Qa? 2y) (? "ary Yay y — Yay? Cae, = "ae Gey)) 
6 tly (L= 722) f 
2 24 ae ay? —s 2 wy2Y a2 Vg ) 
(Gu22 — Tay) — CZ yt GF a = wary Vay | 
ay 
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mau yz — “a2: ey) (7 xy Vary ane dynes CG wz — Tye ‘ny) (Vay Vay? — = x2 a) ane 

(1 ile oe (1 — yNax’) (_l— Trey) d ort y N22) 

(122 — TyzTay) (yz — Vr2V ay) (Tyz x2y — 22 J ny?) 

( es Tr) 

pis (re = Vaz" ny) Pay ary — Ixy) xz —Py2"ny) Cry Yow? — dev} (62) 
avis a ; aa eee i 
(1 — rx)? (1 = yn2?) (1 — ray)? (1 — any’) 

<In general the square of any correlation ratio (ordinary or generalised) differs 
little from the square of the corresponding correlation coefficient; also we have 
seen in (55) that an approximate value for q,2, 1s 


Vey VB, — Vem; = Tey VB. om fen —l. 
Now 8, is itself in general small, so that without making any assumption as to 


the relative order of magnitude of 6, and ,»,?— 12, we may safely treat ¢2,, and 
Yay: aS small quantities. 


+ 


Thus it appears that (62) is an equation in which all the terms involved are 
small, and a certain amount of care is required in deducing from it an approxi- 
mation to the value of »,H— »,R,. 


We have 
1 ms 1 y Ne — T ny 


fe 9 = i 2 7 ) ar sa 
Lyn” LP gy nc —yn2)(1— Tey) 1 — Pgy Si Say (63) 
1 1 1 “Ny? =e te wl. = g/ releist er > 
1 ays L— 1 yy eed) May ; 
yz ee. ( 1 ) 2 2 ( 1 i 
ay ee Smee eat yaar =r ar 1 > 4 2 Ar 
lie ya WMI May 1 (ume a) ie rey &) 
oy at ha SE Aan re a (64), 
Lips 
where 2, = Tay (ya! oe ay) ye ~My and d= (yn? —T ‘ew) (y Ma" zs ny) 
G2) Gary) | lr, (1 — ynz’) (1 =1%xy) 
: ane oy / , Seay 
while = Sire alae vay ats So Seat ease reine (64, 


where ),’, A. may be Agere on Ai, A» by an interchange of # and y in the 
suffixes. 


1 1 
Finally ~ ee ate antonnes (65), 
Quy? Fay t Paty — 24ay2 2x2 ihe. ~ ee ye r xy 
CAE xy 2 
L— Prey 
where 
Ko = Pay + Prey = 24ay aryl xy 


Fay? t+ Pay = ‘Qaey? Ta? y “ay) zis 


a2 
pal 


The suffixes of &,, &/,-’,, Xo, Ay, Xo’ and «, denote the order of smallness of 
these terms. 


(1 77 5))(Qa242 — Tny) (dur eee 
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We make the corresponding substitutions in equation (62), including the value 


Tee hale ee 27 eV yx Manz Se 
ay 


age R? on the right-hand side and obtain the following 
et Tey 


accurate formula for ,,H?—,,R2: 


¢ ; (7 "ye — Vaz i) © “ay Vary Yay2) G 1 ) 
2 2 ve \" vy x en 
(ay Jal z wy R, ) | qd Tay 1—r & 


(xz yz s Gi ay a ary) (= ti) 
(1 cay Pe De an 
ie) {ste ~VyzFay? (Tyz Qa2y — Vaz Jay? 2) (7, ay Jay? — Ya? y) @ "ey Vary — Gy)| 
The ray (l-r wan) ) 


1 
———— 
Qaty? — ay 


= (Tyz i? NazT ey) (Try Qaty ay?) ( re + Ny + 7) 


(1 = rxy)? a7, 
ne meine! yeVay) (Pry Jay? — x2 y) ( ers , ‘ 
(1-7) ee 
nae yz Jat ira ne Voesj2) (7. ez Speier) (Giip = ieee) 
(_-r a 
— ye tae — Bsa ye ay | ce = ae ey) (Tay Ja2y ~ Gov) ( oe ) 
(= 7a) (Lr)? barges 
(Taz — TyzT xy) (Tay Try? — uty) ( Se eae ‘| " 
v6 (l—7,,)° las + EY lili ees (66). 


The right-hand side of (66) apparently contains terms of the first order, while 
on the left the lowest order occurring is the second. But the coefficient of qz2, in 
the first order terms is 


1 
= yy ae aa Vee lacy) TayT zy ar (Cie — Pyzt 7) Tog + Lyz (Px oar Prehiey) (Gere a Toe Tay) 
xy 
1 : 
ee (1— rzy)! {yz —Vxz ie) Vey + (Taz — Lyz Tny) Gz + Px — Papel Piha) 
— Pry 


and vanishes identically. Similarly the coefficient of qg,y: is zero and thus 
equation (66) is a relation between terms of second and higher orders. 

A first approximation then may be obtained by equating second order terms. 
This gives 


(ay H 2 ~~ ay 


R2) (Tyz— Paz xy) (Vay Yary — Yay?) — (Vez — Vyz" ay) ("xy Qry? — Yay) 
3 (1 = 7x)? 


— Paz Qary — Vyz ay? | 
(1 = ray) (Gaye — May) 
_ (Vyz = Vaz xy) (Pay Very — Yay’) E yz + ae — = 2x27 yzV ry iB | 
= al a Tay)? 1 
oy (ae an VyzV ay) Gra Gav aa ary) [a fol r yet T 2% — 2Px2VyzT ny é| (67) 
(1 = ,,? E A hens AA eat ore ; 


Ta 
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The coefficient of ,,H,—,R,/ on the left reduces to 


1 1-7, 
( — ry) [Qu ylaz — Yay? 2Tyz| (1- cs i 


dhe ya Pony 


Pye tT a2 — 20 x2P yzP oy = Ne — Tay é (TyzV xy = Vee)” 
ae Uae Ee 


— 
1-7, Lr ay l= Pp 
a yNe — 3 ”, aii (ya — Tiny) (fer ie =) (68) 
Bee (1 y 1 — yn Teter oes Gane , 
Similarly 


a tape ye a De ce Nye see (Ue ae TNC 
eee uz wel yz! wy er _ ale za wily z( pesca ) me 
1 it os Tn) E, 1 —/ a il oof Tye 1 a fy (69). 
We shall still be correct to second order terms if when using (68) and (69) in 
(67) we replace 1 —,,y,? and 1— any occurring in denominators by 1 —7%,,; so 
that 
ae 
(oy He — sy 2) oe 


ey 


w-¥* 


Yr, ay V2 yo Yay? Lr Ye — Vyz21. xy 9 9 Vyz Vay — Vaz 9 9 
= z ; = : 5D Gne == T*2y) a Ti = Fr Gar = Te) 
ay E 


Qa2yVaz — Yay?! a 7 ay 


Vay Vay? — Ya2ry Ve, = VyeT xy 2 3 YoeVay ~ Tyz . 
a (Car i Tt) a 1— 7, Ment = Ta) 


Yay?’ yz ~ Yx2y Vaz 1 — Pay 


§ 8. This result is of importance, as it shows that the heavy labour of the 
direct calculation of the generalised correlation ratio can be replaced by the 
calculation of four simple correlation ratios. 

The coefficients involved are the ordinary coefficients of linear regression 
denoted above by y;, y; and expressions involving product moments of orders 
3 and 4, To these latter we may approximate by the methods of § 6 

Qaryx- 1. Array 


A good approximation for —“*—— 1s _ . Ifgreater accuracy be needed 
Gary? — May + py 


Jury — 1 aa (4(A.+ Bo - 6) + ZT ay 
Gury — Pry 1 +1 ay +4 (Bs + Bs — 6) Pay 
We saw in § 6 (equation 55) that 
Vaty = Vay VB, + VN Gp= a ay VB.—Bi—1 oS ) 
Yay? = Vay vB an Nae Tey v By — 6-1, 
approximately. 8, and @,’, which are zero in normal correlation, will in general 
be very small compared with gn’? —72y and yn? — 7xy so that 


Voy Vary ~ Yay? be) Vey V (Be zm 1) (cy? = Ty) a V (By —1) G Ne — ry) 
QaryVaz — Yxy?"yz  Tzz V (Be — 1) @Iy? = Pay) — Tyz V (Be —1) (yn? — xy) 


we may use 


and 


Voy Vay? — Yary  _ Vay V(By = OGie = fay) = V(Bs —1) (7,7 — Poy) 
Gay? yz — Quty lz Vyz Vv (22 Be —1) Gis Toy) V(Bo —INiGny Tv ee 
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In the important case 8, = 8,/=0 and ,.=2n,=Te, these approximations 
fail, and so does the process by which (70) was obtained as @ and ¢ vanish and 
(61) is indeterminate. 


We must then fall back on equation (59) 
xy H?—- ye = (a? + c*) (1 — yz”) 
=(ys +c’) (1 —1xy) since 0@=0 and 9, = Tay 


= 3 (1 — Mey) + = (1 + #3), 


Gary? } ar y 
1-7, 
cy 


2 
Ya2ry? — Tay 


or Cay He TF wy Te?) € <a ) = yNe 7 ay Jay =e ys" (1 >a iea)) 


neglecting 38rd order terms. 


The right-hand side reduces to , — 7?,,, so that 


9 2 ¢ BE i: 4 9 9 

ae GS) (ne =1)) ine (71) 
Yxty2 — 1 
PA 9 2 bs 

ae aaa (yn? — 1°,,) approximately ; 
xy 
of course in these circumstances (60) would lead to the value 
29, ) a 5 5 
ee = TF cet ean PeR PROBA ARIA on. Jas0005000%0 (72), 
ay 


showing that if 8, = 8, =0 and if (,.7,? — 7,7) — (yn2? — Mxy) is of higher order than 
the first then (,7.2— 1.2) — (,n2 — 7°.,) is also of higher order than the first. 


We shall now seek relations between the six correlation ratios of three 


) 


“hyperbolic” variates. 
From (59) and (60) we get, on eliminating ,,, H/, 
ye — af =P > COLO Une — 29) Heyes — 0 sn, 
= 3? — ys + ys ye? — Ysa Ny” +2 (ys9 y nx = Ys Px’) 
+0 (0 — 6 + ya? — any + Pyne — Fany?) 
= (5? — ys") (L= ay) + Ys" (ya —Pny) — Ya 2 (ey? — Tay) 
— 2¢ (79 — yb) (1 — xy) 
+ {209738 (y nx? — xy) — 2erys'h (omy? — Tay) 
+O (P — b+ ye = aty + Pyne — Px Ny’) 
The terms in the second line are second order terms. Neglecting these and 
noting that 
(Ys? — ys?) (L = Pay) = yz — Mae and (39 — ys'b) (1 — ray) = Pye Gay? — Tez Qary, 
we obtain the following equation for c: 
(yne — Tey) — (ane? — Mex) — Ys? (ya? — Tray) + Ys? (ey? — Tay) 
=D Vary — Tye Vay) eee (73). 
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24/2 P i 
Now (eH? - mR 2) fee 2c?7*,, 1f we neglect second order terms. If 
ary? a ey 


we use the value of c given by (73) in (70), and adopt the notation ,U, for 
ane — Vx, We have 
Tvy ty Up: 2 U, rae Ys y OF aF pee Gi 
= 2 (TuzQery — Py Gay?) (Pay Gary — Gey) Ys (ye — Ys? y Ur) 
— (Pry Gey? — Yury) Ys (@Uz — Y2?2Uy)} 0... (74). 
Let us write »x, for ,U,—;?,Uy and yx, for ,U, — y3°,Uz, then (74) becomes 


(pe =e apeay Tey = 2 Gee Gy — yz Jay?) {rys. Cary oF Joy?) yXz— Ys Cosy Yay? — Yury) aXet 


(75) is a relation between second order terms and it is sufficient to use 
equation (55) for qaz, with 8, replaced by zero and B, by 3, so that 


arty = "2,0 y, 
ay? = V2,Uz, 
(X2— y Xz)? Tay = 4 (Taz VUy eye VU) Ls y Xz (Tay VpUy i VU 2) 
= Wane Vay Vp Um— Nay) Weoeeon (76). 
This identity between »7-, yz, xn, and ,7, is symmetrical in w, y. Two more 


such identities may be obtained by interchanging the letters «, y, z in cyclic order*, 
There are therefore three identities between the six correlation ratios : 


yNa>. «Myr, yz. zNyr az, za 
I have not so far succeeded in reducing them to simpler forms, although possibly 
such exist. In special cases simplifications result. These are illustrated in the 
following section. 


§9. We defined y;, y,;' the regression coefficients of z on w, y by the equations 


Vaz — VyzVay 1 Tyz — VazVay 
(eer Ch te LE (10). 
a lay 


ee diaellv 1 Vee ~ Vy yz 
Dae Pe Se le a a ae 
Pye SE (77) 
_ Vyz — Vaz ay 1 Vay — ye" ee 
Dee 76 2 nk amare 2 
— 1" xg 1-7, 


It will simplify the algebra if we use X”, p?, v°, XN, w, v? for yUy, -U,, 2U2, Ue, 


Buy. je respectively and P,Q, RP”, QR’ for yes eXys aXe aXe» aXu» vXe 
jeapectively so that 


P= - ye”, Q= fe Ya? W?, R=r—- YH) 
Dee Ne yy? oe, Q’ an Me es yo? y?, Rea nyt? J Siesietase seats 


* We are supposing here that the regression surfaces of # on y, z and of y on z, x are also hyper- 
boloids of type similar to (35). 


52—2 
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The three identities connecting the six simple 7’s become in this notation 
(R— RB’) ray = 4 (rz f= Pyzd) {ys BY (ray pw! — 2) = aR (Tyr — bY} --(79), 
(P= PP rye = 4 (ye ¥! = Tex) (yy P" (Pye ¥ = Bb) — HP (Tyee — v’Y} ---(80), 
(Q a Or =4 (12 — Vy V) {2 Q GaN —V) 9, Q Ca = ny} ...(81). 
(i) We can deduce from (79) that if the correlations of both # and y on z be 
linear and equal, then the correlation ratio of 2 on y and y on @, i.e. ny and yz 
are equal. Thus in biparental correlation, if the regression of the child on each 


parent be linear, then the correlation ratio of the father on the mother is equal to 
that of the mother on the father. 


For under the conditions stated 


az = yNz = Vea = Vey 


Hence y,= 3 =- Je and (79) becomes 
1 ae Voy 
st DP = pee P rey = Arye (Mo A) [= 8A? (Pay — A) + YH? (Tay r — BY}, 
oe 3% — BYP A+ BYP Pay = Ary (X= WY {May Apel — 2? — Dye! — 7h, 
Us (A — bw’)? a (A+ WP — 4 (ray rp! — Ape’ — DV? — we?) | = 0, 
ey 


which reduces to 


No RG +2?A+(2—717%,,) pi ti 4” (27 ay + 3) (Tay + aa 
A-pwy ~ — : ; : = 0. 
( i) | (es ap i) (Tay ap 2)? 


But rzy is numerically <1. .*. the factor in curved brackets is positive. Hence 


Niles Ope Cee eT 

(ii) An interesting deduction from the identities (79)—(81) is the following: 
“Tf any four of the six regression lines that occur in the mutual variation of three 
variables are linear, so are the other two.” 


We have to prove that if any four of the six quantities ), w, v, X, mw’, v’ vanish, 
then the remaining two vanish as well. 


Hirst let 4p = — Ae 10; 


(79) gives Toy? (Ys fe? + VP = Are w [Ye VEN ny ft — Yas 1°}, 
(80) NV Ty? = — ray nn”, 
(81) bg = 
From (80) yy [Aes Pye + yr 27y27] = 0. 
But Ary, Vay ar wees = (ee ee — at) >0. 
Wy =), 
and 0) 


and these satisfy (79). 
There are three cases of this type. 
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The case X’= p’ = vr’ =X = 0 is proved in the same way. There are three cases 
of this type as well. 


Next take the three cases of type 


N=p=0, 
N= 0. 
Equations (79)—(81) become 
(79) i — se se ep yr. 
(80) Ny ry? = ey’ (yyy? (— vf, 
(81) ys”? = 4(—Tryv) {— Yeo y2V? (— vd}, 
(80) leads to (ry By Array eya) yt = 0. 


We have already shown that the first factor is positive, 
pv =0, 
and hence v= 0, 
and these values satisfy (81). 
The three cases of type X == pw’ = v'=0 lead to 
(79) yt 7, = 0, 
(80) Ny, = 0, 
(81). (ye? v? = y2?X?) Pye? = A (Pye — Pryv) [= 2 Yo? (Tae =v) + 222A? (Trev — VY}, 
whence \’ =v=0. 
There remain the three cases of type 
N—i— 0) 
(0. 
Here (79) is satisfied identically. 
(80) becomes OM? = YP YP Pye = — Aeryzps (ya (A? — 1?) (= wh, 
(81) (WP = YP A?P Tae = Aryzd" [= Yo (Mw? = yo ®”) (= VI, 
which reduce to 
(A? = 2?) [Pyed? = (WT yz + Ae’ Taz) w} = 0, 
(w? — 2? d’?) et — (279 2 + Ayo) "yz) 7} es 
The only common solution of these equations is 
0. 
We have thus accounted for all the fifteen possible cases. 
(iii) Three regression curves linear. 
In six cases out of the possible 20 cases the linearity of three only of the 
regression curves involves the linearity of the remaining three. 
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Let X= p= 0 and either v or v’ = 0. 
It follows from (79) that R= RB’, ie. v= v’. 


“. both v and v’ are zero. We have now four linear regression curves, .’. 


six are linear. 
Le p=v =0 and p =—0: 
Since; = v= 0, 3) P= or N= eesontliat 
P= — yee NeSore, 
P22 = per — yo Ren 
(79) becomes (v? + 932A?) ey = 40 yes? (Tay? — Ys"¥s V2), 
(80) becomes 
(ye? v? — rye? MP)? Pye = A (Tyzd — Pry V) Yayo ¥? [(Yov — Ya’) + Taz (q2'V — Y2X)]- 
The first reduces to 


{x Val Cee + 1 yP xy) Tey — :  2ryz 


Ven == (0) 
Se) a we 


and the second to 
(av? — yo! rN)? Ie Ayarye, (Mz — Tey vy) v? = 0, 


all 


Hence either 7}=v=0 or there must be a very special relation between 


Vays Vyzs Vee 


If instead of uw’ =0 we take v= 0 we get similar results, ie. in general the 


vanishing yu, v’, w’ or w, v’, v involves that of X, v, ’ or A, pw’, V. 
This accounts for six more cases. 
There are eight left. Of these six are typified by 
fp =7=0=y 50m ut 
and lead to the same conclusions. 
The remaining two are 
N= — 7 — 0 Or Ay 
The first supposition, \=~=v=0 gives 
Pa=-yv?, Q=— 9°, R= yp, 
P=nr2, One RR’ = p?, 
leading to 
(79) (vps My? Tye = bres Wye! (Pay? — Ys 7}, 
(80) 2+ yy?! 3 y, = Atay VO [Tyed? =v? | 
(81) (2 + yg? A)? 729, = Ary Neyo" {Taz fl'? —. Yas NJ, 


which give A’ =p’ =v'=0 or a very special condition to, he satisfied by the 


correlation coefficients ry, Tyz, Tex: 
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We may conclude then that in general the linearity of any three of the six 
regression lines involves that of the remaining three. 


(iv) If the regression surface of z on #, y reduces to a plane, the regression 
curves of w# on y and y on & reduce to straight lines. 


We have as in § 7 


z, Goya) Gj 
LU eg gph OEE Ne als, (58). 
on: Ty, Cy OxFy 
@, 4 ya Pay ie > Y 
But Pe ear EEE EEA CHa AB YC ces if, 
ox aca “ Bi Bat oye Ce 
By d+a ie here ee Date, (av B, +¢) Vee Me — peal Ped 
Oz oe B, =e oy é ne -8, = 
Op = B (yt? — 9" UR a), ies Poy “} 
cemeg WC aye C aye ae +a SET 
oy (( i B2— 8-1 Pris 


+ terms 


Now the regression 


coefficient of y? is zero. 


and thus 


Sunilarly 
But c¢ vanishes when 
Hence if 


it follows that 


of higher order. 
of z on y for a constant « is linear, Therefore the 
To first order terms we may put 8,=0 and £,=3, 


aaa 

CV ey ats a ee = (0, 
ae — x 

Clay £ b ne i 9 =f =0, 


ga hee = lags by (52). 
pal = uy Bi, 


“Ny = yNa = Vay: 


We thus see that if the three generalised correlation ratios »,H,, yzHx, xH, 
are equal to »,fz, yx, zx, respectively, the six correlation ratios yn, Nz, 
“Mz, 2x» ys yz reduce to the corresponding correlation coefficients xy, Tz, Tyz 


and that the “linearity ” 


the six regression lines. 


of the three regression surfaces involves the linearity of 
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I. On Spurious Values of Intra-class Correlation Coefficients arising 
from Disorderly Differentiation within the Classes. 


By J. ARTHUR HARRIS, PH.D. Carnegie Institution of Washington, U.S.A. 


WHEN the constants of the and y characters of the population in 7, are quite indistinguish- 
able symmetrical tables* may be used, but not otherwise. 


Primarily and for the most part, however, the use of symmetrical tables has been restricted 
to cases in which the degree of interdependence between the measures of all possible pairst drawn 
from a considerable series of associated individuals—in short to intra-class correlations {—is 
sought. 


The dangers of spurious correlation due to the artificial symmetry of the surface is then much 
greater §. Pearson|| long ago pointed out that when intra-class differentiation exists, for example, 
because of age in the case of characters determined upon the members of a fraternity, or of posi- 
tion on the axis in the case of serial organs, the values of 7 may be to some extent spurious. 


In the cases considered by Pearson differentiation is an orderly phenomenon, i.e. the magni- 
tudes under consideration increase or decrease with age, position on the axis, or some other 
extrinsic characteristic with such regularity that the relationship can be expressed by an 
equation which may be used in correcting the raw values of 7. 

In other cases, the problem is not so simple. Ditferentiation within the class may exist, but 
it may be difficult or impossible to arrange the individual measurements by any character outside 
of themselves to obtain the constants necessary for determining the true correlations from the 
spurious values deduced from the tables. 


Illustration [. The correlation between yields of wheat in variety, testing. 


In variety testing, the experimenter seeks (or should seek), among other things, to determine 
the correlation between yields of varieties in different years. If this correlation be 0 (and regres- 
sion be linear) it is clear that the yield of a variety in one year furnishes no basis for prediction 


* R. Pearl, Biometrika, Vol. v. pp. 249—297, 1907; H. S. Jennings, Journ. Exp. Zool. Vol. xt. 
pp. 1—134, 1911 ; J. Arthur Harris, Biometrika, Vol. vit. pp. 325—328, 1910. 

+ K. Pearson and others, Phil. Trans., A, Vol. cxcvu. pp. 285—379, 1901; K. Pearson and 
A. Barrington, Eugenics Laboratory Memoirs, No. V, 1909. 

+ Biometrika, Vol. 1x. pp. 446—472, 1913. 

§ With only one pair of measures the probability of spurious correlation is, in cautious work, very 
slight, for the possibility of differentiation can be easily tested by the critical comparison of the physical 
constants, 

|| Pearson, K., “On Homotyposis in Homologous but Differentiated Organs.” Roy. Soc. Proc. 
Vol, LxxI. pp. 288—313, 1903. 
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concerning its yield in any subsequent year. If, on the other hand, the correlation be high, 
prediction from a few years’ test may be made with great probability of certainty. 


Given a measure of the “performance” of a series of varieties during a number of years it 
would at first seem quite allowable to form symmetrical tables or to use the intra-class formulae 
of a former paper* to determine the intra-varietal correlation, and to regard this as a satisfactory 
measure of the differentiation of the varieties and of the average prediction value of a year’s test. 
Such is, however, not the case, for while there may be no orderly change in yield throughout the 
period under consideration, the individual years differ greatly in their average yield for all the 
varieties. The influence of this “disorderly differentiation” upon r is admirably shown by 
A. D, Hall’s+ table of the yield in bushels of wheat in the Rothamsted experiments. 


Let b=yield in bushels per acre of any one of m varieties in any one of n years, 71, Y2 be the 
“first” and the “second” years of a symmetrical intra-varietal correlation surface, v,, v, be the 
“first” and ‘second” varieties of a symmetrical intra-annual correlation surface. Then Toy, oyy 


will be a (spurious) measure of the (persistent) differentiation of varieties, Ty byg? 2 (spurious) 


measure of the differentiation (in the yield of all the varieties) of years. Applying formulae 
(v)—(ix) of Biometrika, Vol. 1x. p. 450, to these data, I find 


S[n (n—1)]= 2128, 
S[(n—1)3 (b')]=83122°5, S[(n—1) 5 (b2)] =3483626°4, 
S [5 (b') P=3610204:57, S [5 (b)]=370820°13, 


6=39:0618, 0)? =111°257328, 
"by, Py = — 032. 


The result is obviously spurious, for mere inspection of the entries in the table shows that 
some varieties regularly give heavier yields than others. The source of the spurious value is to 
be seen in the fact that an intra-class coefficient has been calculated from a symmetrical surface 
formed from classes (varieties) represented by a series of yields differentiated by annual variations 
in the growing conditions. By correcting for this source of differentiation by expressing each 
yield as a deviation from the mean yield of all the varieties for the particular year, i.e. b”’=b—b,, 
where the bar denotes a mean and the subscript y that it is for all the yields of a year, I have 
found ¢ 


"by, byo= 266. 

Measuring the differentiation of years in terms of intra-annual correlation (intra-class correla- 
tion in which each class is defined by the year and its individuals are the yields of the different 
varieties grown), I find from Hall’s table 

S[m (m— 1)]=4440, 
S[(m—1) 3 (b)]=174129-2, S$ [(m —1) 3 (b)] =7317531:92, 
S[s (b) P=7586436'21, S[> (b)]=370820°13, 
b=39-2183, 042=110-017719, 
Ty, by = "791. 


Since the varieties have been shown to be differentiated, this result must also be spurious. 
Let b”=b—6, where the v indicates that the mean denoted by the bar is for the yield of the 


* Biometrika, Vol. 1x. pp. 446—472, 1913. 

+ Hall, A. D., The Book of the Rothamsted Experiments, p. 66, 1905. 

+ Science, N. S. Vol. xxxvi. pp. 318—320, 1912. Probably a better method of dealing with such 
cases will sometime be found. So far I have not succeeded. 
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variety for all the years it was grown. Correcting for the influence of the differentiation of 
varieties in this way I have found* 
= Ooi 


Pett Ug 


Thus season is a far more important factor than variety in determining an individual yield. 


Ittustration II. Influence of Personal Equation upon the Correlation between the Grades 
assigned to the Same Paper by a Series of Instructors. 


Stripped of the verbiage in which it has been clothed in discussions among pedagogues, one 
of the chief problems concerning the reliability of the grades assigned in examinations resolves 
itself unto the statistical question: What is the correlation between the grades assigned to the 
same paper by different instructors ? 


Let g be the grade assigned to any one of m papers by any one of 2 instructors, let 7,, ¢ be 
the “first” and “second” instructor (of a symmetrical intra-class table) passing judgment 
upon a paper, ~1, ~2 the “first” and “second” paper graded by the same instructor. Then 
from Table I of D. Starch+ I deduce, by the intra-class formulae (v)—(lx) of Biometrika, Vol. 1x. 
p. 450, 

‘O71. 


= '659, 


Vo. gy /E = 
ij Vig 9p "no 


By using the deviation method as illustrated above, I have found 
P94 !ig = TB2 Pol'y Ol pg = B86: 
TABLE II. 
Grades of Papers Assigned by Various Instructors. 


Instructors. 


pe 


8G aes eae | 80 
| 
| 


Papers. 


SDWNA AW Co MH 


Both of these results, in which an attempt was made to correct for the personal equation of the 
instructors in determining the correlation between the estimates of different instructors on the 
same paper, or to correct for the differences in merit of the papers in testing the individuality of 
the instructors, are higher than the raw values given above, which are clearly spurious. Similar 
results { are obtained from Jacoby’s astronomical grades§. 


* Science, loc. cit. 

+ Science, N. S. Vol. xxxviu. p. 630, 1913. 

+ Personally, I can attach little pedagogical significance to series as short as those of either Starch 
or Jacoby. They serve here as illustrations of method merely because I know of no more extensive 
series. 

§ Science, N. S. Vol. xxxr. p. 819, 1910. 
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The essentials of this note may be summarized as follows: 


In using Intra-class coefficients care must be taken to guard against spurious values arising 
through differentiation among the individuals of the class. 

Besides the orderly differentiation (due to age of individuals, position of organs on axis, etc.) 
for which Pearson has determined corrective formulae in terms of correlation coefficients, a 
disorderly differentiation for which such corrective formulae have not as yet been found some- 
times obtains. Illustrations of such cases are here given. 

Probably the empirical methods used here in correcting for this disorderly differentiation 
should be replaced by formulae with a sounder theoretical foundation. This I have not as yet 
been able to do. 

The purpose of this note will have been served if it directs attention to a source of danger 
which may sometimes be encountered in the use of serviceable formulae, and indicates a method 
by which in the absence of more perfect methods practical results may be secured. 


CoLtp Spring Harpor, N.Y. 
February 3, 1914. 


II. On an Extension of the Method of Correlation by Grades 
or Ranks. 


By KARL PEARSON, F.R.S. 


In a memoir published in 1907* I have shown how, on the hypothesis of normal distribution, 
the true correlation of variates » may be ascertained from the correlation p of grades. If 
g, and gz be the two grades, v, and v2 the corresponding ranks, # and y the corresponding variates 

_ with means # and y, and standard-deviations o, and o2, while 


2 Ore 2 
NV 1 tialR- a) 


2 
= C102 02 
270102 vies 


is the normal frequency surface of the variates, then 


Fe LG Ney 2-1 
N=3N=%H, Fg) =F =7:N, 
ei Laas 
1 —-R=1=>F— | e° 91" da, 
N20, / 0 


oy, = 02x, = 7b (W2—1). 


Further I showed in the memoir just cited that 


y=2 sin (F ?) 


* «On Further Methods of Determining Correlation,” Drapers’ Company Research Memoirs (Dulau 
and Co.), pp. 11, 12. 
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where a convenient method of finding p was by the formula 


: ey OSI De 
or again by =1- WN?) 


The problem has recently occurred of dealing with data where: 
(i) One variate is given quantitatively, the other variate is given by ranks. 


For example, place in school-class has to be considered in relation to marks in examination, 
or the rank in a teacher’s general appreciation has to be considered in relation to marks in 
examination. 


(ii) One variate is given by broad categories, the other by ranks. 


For example, five or six categories of general intelligence are given as the basis of the 
) 8 g 

teacher’s classification of intelligence, and this has to be considered with regard to rank in, 

say, class or examination, possibly with regard to a special subject. 


We require in both cases to deduce from the data the true variate correlation. 


Case (i). Let x be the character measured by its grade, y the character given quantitatively. 
Then with the notation above, if p’ equal the correlation of grade and of variate, + the corre- 
lation of the two variates : 


1 Pay 
ar Noyo,’ 
where 
$2 [te 2 = 
Pay= | a —_ 2(Y¥-Y)(W-H) aedy, 
pz, , +0 [to 5 0 
Pot is ie (y-¥)%q ap rey: 


Integrating by parts after putting y¥=0 and writing 
de _ de» 
dr 1 dady? 
qh aco pets di, dz 
‘Baw. | ie 10x F a 


Integrating again by parts: 


dr Lol Sone 
: ry) 
+0 [+o = 
sos] al e ~ozdady 
aoe 
2-r 2rx'y’ y? 
2 [+o [+o 4 12, 
__ o9iV? Ie | 1 = yes ee =) dx dy 
NOR. —« 2m FS 
o, N? 1 o V2 


oot i Nr 
re” ary 


Hence dip! _ 7 (ae \= NV = 
No204, Wr oy, 


dr adr 
* Phil. Trans. A., Vol. 195, p. 25. 
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Thus since p’ vanishes with 7, 


p’ = amie 


Thus finally 


r= us 5 P= 10233 p’, 


-) 
ne aoe 1-0233 p”. 

It will be clear from this that the correlation p’ between rank and quantitative variate can 
never be “ perfect,” for it cannot exceed the value ‘9772, otherwise the correlation 7 would exceed 
unity. It will be seen that for practical purposes 7 is very close to p’, but still from the 
theoretical standpoint, it is not without interest to discover that the correlation between 
ranks and a quantitative variate can never be perfect. For example, it is impossible to have 
perfect correlation between place in class and examination test, even if the boys were in the 
same order in class and examination. The defect, however, will be very slight. 


Case (ii). Let the subscript C refer to any “broad” class and let n be found from either 
of the formulae 


12 1 — =\2 
ae S {te Gog) 


ee 12 — = 
or 7 WV (¥2 —1) Be a 


the first applying to grades and the second to ranks; then 


12 ges 
r= 1:0233 wi Wa 8 (ne Go- 9)", 


Ghee = 
ee = 10233 rl n S {te Vo-v)}; 


according as grades or ranks are used. In actual practice the values of 7’ or 7” should be 
correct for number of classes and for “broad” categories. See Biometrika, Vol. vit. p. 256 and 
Vol. rx. p. 118. 


Numerical illustrations will be provided later. 


III. Correction of a Misstatement by Mr Major Greenwood, Junior. 


In a recent paper by Mr Major Greenwood and Mrs Frances Wood “On changes in the 
Recorded Mortality from Cancer and their Possible Interpretation*” occur the following words : 
‘The case is evidently analogous to that studied by Professor Karl Pearson in his pamphlet, 
The Fight against Tuberculosis and the Death-rate from Phthisis (Dulau and Co.). Professor 
Pearson published three diagrams: (a) the general death-rate of England and Wales; (6) the 
phthisis death-rate ; (c) the ratio of phthisis deaths to all deaths. The original figures seem to 
have been the crude rate for males and females separately from 1835 onwards.” The “ evident 
analogy” with what appears to me the wholly fallacious treatment of the authors in their paper 
above cited I do not now stay to discuss, but I wish to draw attention to the words: “The 


* Royal Society of Medicine, Proceedings, Vol. vir. Section of Epidemiology, pp. 79—170. 
March 27, 1914. 
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original figures seem to have been the crude rate for males and females separately from 1835 
onwards.” Why the writer of these words should have assumed them without any inquiry 
of me, or any examination of the values of the crude death-rates (which are accessible to every- 
body) to be “crude death-rates,” I do not know, but they illustrate his readiness to form a 
biased judgment when his feelings are stirred by unfavourable criticism. As a matter of fact 
the rates were standardised rates reduced to the population of 1901, and most kindly 
provided at my special request by the General Register Office. It is of interest to observe that 
Dr Weinberg of Stuttgart—recently made precisely the same charge as Mr Major Greenwood 
with the same over-hasty assumption that the reality must be the desired, if undemonstrated, 
error*, With the German as with other foes, it is well to leave ample opportunity for their 
assuming you to be foolish ; their assumption may lead them to run against hard reality. 


Kees 


IV. Note on Reproductive Selection. 


By DAVID HERON, D.Sc. 


The fact that in the case of mant fifty per cent. of one generation comes from twenty-five 
per cent. of the preceding one was first noted by Professor Karl Pearson in the Chances of 
Death (Vol. 1. p. 80) and in dealing more fully with this important generalisation in the Ground- 
work of Eugenics, p. 27, he said: “It is very difficult from any English statistics to determine 
how many adults never marry. No information on this point is asked in the death schedule for 
males ; it is asked but imperfectly answered in the case of the schedule for females.” In 
a footnote he adds: “The Registrar-General informs me that the record of civil condition 
in the case of female deaths is worthless and that no useful return can be made from it.” 
He found that in the Argentine and in Scotland 60 per cent. died unmarried, in the United 
States 51 per cent., and from the last two English Censuses and the Annual Reports 48 per 
cent., and added “ This indirect method of reaching the result is, however, not very satisfactory. 
We may, I think, conclude in round numbers that 40 per cent. of the population dies before it 
reaches the age of 21 and that probably another 20 per cent. are never married.” On this 
assumption Professor Pearson proceeds to show that “about 12 per cent. of all the individuals 
born in the last generation provide half the next generation.” 

Some data published in Bulletin of Population and Vital Statistics No, 30 for the Common- 
wealth of Australia (Tables 48 and 84 a and b) prove that the assumptions made lie very close to 
the facts. The data are shown in the following table which gives the conjugal condition and 
issue of the males and females who died in Australia in 1912. From this we find that half the 
total number of children came from 3337 of the parents (all those who had at least 9 children 
and part of those who had each 8 children). It thus appears that of the males 17,404 out 
of 30,285 =57°5 °/, died unmarried while half the total offspring came from 25-9 °/, of those who 
married and 11:0°/, of the whole number of males, so that approximately three-fifths of the 
males born die unmarried and one-half of one generation comes from one-quarter of the married 
population or from one-ninth of all the males born in the preceding generation. The diagram 
gives a graphical illustration of the argument. 

In exactly the same way we find that nearly one-half of the females born in Australia die 
unmarried and that one-half of one generation comes from one-quarter of the married and from 
one-seventh of all the females born in the preceding generation. 


* Archiv fiir Rassen- und Gesellschafts-Biologie, 1x. Jahrgang, 8. 87. Leipzig, 1912. 
+ It has also been dealt with in various mammals. See the Groundwork of Eugenics, Eugenics 
Lecture Series 11 (Dulau and Co.), p. 29. 
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Conjugal Children in || Deaths of Total Deaths of Total 
| Condition | Each Family Males Children Females Children 
Single 0 17404 —_ 10011 — 

Married 0 1422 — 1317 — 
5 1 1036 1036 1083 1083 
45 2 1098 2196 992 1984 
x 8 1127 3381 1050 3150 
S 4 1147 4588 1001 4004 
- 5 1070 5350 976 4880 
s 6 1058 6348 1013 6078 
5 tf 1040 7280 974 6818 
ry) 8 992 7936 881 7048 
Ar 9 819 7371 799 7191 
A 10 | 801 8010 622 6220 
5 11 473 5203. 469 5159 
" 12 394 4728 314 3768 
ne 13 196 2548 193 2509 
- 14 109 1526 101 1414 
Bs 15 50 750 57 855 
16 | 27 432 22 352 
a 7 119 8 136 
H 18 5 90 3 54 
x 19 6 114 2 38 
” 20 3 60 2 40 
» 21 = — 1 21 
a 22 -- —_— 1 22 
iY 23 J 23 — — 

Totals 30285 69089 21892 62824 


Diagram to illustrate the fact that three-fifths of those born die unmarried and that one-ninth of 
one generation produce one-half of the next. (Deduced from records of males.) 
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By H. WAITE, M:A., B.Sc. 


1. Introduction. Certain papers have been published in recent years giving 
the results of research on the variability and correlation of the hand, notably 
(1) “A First Study of the Variability and Correlation of the Hand,” by 
Miss M. A. Whiteley, B.Sc., and Karl Pearson, F.R.S., Proceedings of the Royal 
Society, Vol. 65, pp. 126—151, and (2) “A Second Study of the Variability 
‘and Correlation of the Hand,” by M. A. Lewenz, B.A., and M. A. Whiteley, B.Sc., 
Biometrika, Vol. 1, pp. 345—860. In the former the writers urge “the import- 
ance of putting on record all the quantitative measures we can possibly ascertain 
of variability and correlation ” of characters of the human body. Although Finger- 
Prints, the characters dealt with in the present paper, cannot strictly claim to 
_ be quantitative it is hoped by the writer that the results may prove of some 
interest and use in the solution of the great Problem of Evolution in Man, 
especially when compared with the results obtained from the study of other 
measurements of the hand. 


The principal motive underlying most of the work which has been done in the 
past on the subject of Finger-Prints has arisen from the development of means of 
identification and it was based on the fact that the general pattern and character- 
istics of the finger-prints of any individual are persistent throughout life. As far 
as I am aware, however, no paper has yet been published attempting to measure 
the association between the various types of finger-prints in an individual or com- 
paring these with the relations which have been found to exist between other 
measurements of the hand. These are the objects of the present paper. 


2. Primary Classification of Finger-Prints. As primary classification 
Purkenje proposed nine types, Galton* three—each being divided into twenty- 
four sub-classes,—and Henry+ four, these also being sub-divided into a number of 
classes. For the purposes of this paper I have adopted the method of dividing all 
the prints into four primary classes; I have also adopted Henry’s definitions and 


* Fingerprint Directories, by Francis Galton, F.R.S. Macmillan, 1895. 
+ Classification and Uses of Finger Prints, by Sir EK. R. Henry, C.V.O., C.S.I. Wyman and Sons, 
Third edition, 1905. 
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nomenclature as far as they are required, and these follow in general those of 
Galton. Secondary classification with its minute details is not used in this paper. 


The four classes referred to above are Arches, Loops, Whorls and Composites. 
In Arches the ridges run from side to side, consecutive ridges being roughly 
parallel and the curvature increasing in general from the base to the tip. 


(Plate XX. Fig. 1.) 


In Loops some of the ridges are doubled back upon themselves making a half 
turn or a little more, the two parts of the doubled ridge diverging from each other 
at the centre of the pattern. (Fig. 11.) Consequently this pattern has an open 
mouth directed downwards either towards the right or towards the left of the 
finger. The direction of this opening supplies a means of subdividing Loops into 
Radial and Ulnar Loops according as the direction is towards the radius or towards 
the ulna, that is, towards or away from the thumb. As will be seen later (p. 422B) 
the proportion of Radial Loops is very small except in the forefinger, so that this 
method of subdivision has been used only in dealing with that finger. 


In Whorls some of the ridges make a complete circuit, either as closed con- 
centric ovals or as a more or less continuous ridge forming a spiral. (Fig. iii.) 


Composites consist of combinations of two or more of the other patterns. 
(Fig.iv.) In this class are also included those finger-prints which are too irregular 
in general outline to be placed in any one of the other main groups. 


This class also includes the bulk of those patterns about which Sir Francis Galton, 
in his book on Finger Prints*, p. 79, states—* They are as much Loops as Whorls, 
and properly ought to be relegated to a fourth class.” It is possible, however, 
that some of Galton’s “ambiguous cases” may have been classed in this paper 
with Loops. 


For further details of these principal classes with their modifications and sub- 
divisions reference may be made to the works mentioned in the footnotes on 
p. 421. 


3. Material. The material on which this investigation is based consists of 
two thousand complete sets of finger-prints of adult males, part of a much longer 
series in the Biometric Laboratory of University College, London. They belong 
to the lower type of artisan and labouring classes. No selection whatever has been 
made, except that a few sets, which were incomplete or which contained prints so 
damaged as to be indecipherable, have been rejected. 


4. Symbols, The following symbols are used :—A = Arch, SZ =Small Loop ; 
LL = Large Loop (see p. 423); W= Whorl; C= Composite; L,= Radial Loop; 
L,,= Ulnar Loop; R=Right Hand; Z=Left Hand. &,, R,, R;, R,, R; designate 
the thumb, forefinger, middle, ring and little finger respectively of the right hand, 
and L,, L,, L;, L,, L; represent the corresponding fingers of the left hand. 


* Finger Prints, by Francis Galton, F.R.S., Macmillan, 1892. 
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Whorl. 
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Illustrations of the four fundamental types of Finger-Print. 
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5. Distribution of Classes of Finger-Prints. A preliminary survey of the prints 
brings to light a considerable clustering together of prints of the same kind. 
Thus, each of 241 sets contains prints of one class only ; each of 329 sets has nine 
prints of one class, and each of 194 sets contains eight out of the ten prints of one 
class; that is, each of 764 sets, or over 38°/,, has at least eight prints of one class, 
the large majority of these being loops. Again, each of 892 sets contains prints of 
two classes only, so that each of 1133 sets—or nearly 57 °/, of thre whole—has 
representatives of not more than two of the four classes. On the other hand all 
four classes appear in only 95 sets, while the number of single hands, each of 
which contains at least one of every class, is only 23. 

For the calculations which follow it has been found advisable to subdivide the 
loops into two classes, Small Loops and Large Loops (p. 423). Considering these 
as separate classes, giving five types in all, the distribution of numbers of types 
for the two hands is shown in the following Table: 


TABLE 1. 
Distribution of Types in Right and Left Hands. 
Number of Types in Right Hand. 


[=| 7 
= 1 2 3 4 5 Totals 
a 

Ewe 1 37 84 47 6 = 174 

As 2 65 | 465 | 360 61 4 955 

sM| 3 15 | 256 | 347 | 96 2 716 

2 4 1 36 83 30 1 151 

2 & 5 = 1 2 1 = 

= Totals} 118 | 842 | 839 | 194 5 

q eee patel = 


In this Table, taking as origin the cell (3, 2) containing 360 types, we have the 


following results : 
Mean of Left Hand Types, 428 


oy, "7628 
Mean of Right Hand Types, — 435 
a ‘7608. 


We thus find the correlation coefficient (7) to be ‘281 + 014. 


The contingency coefficient (c), corrected for the number of cells, is °289. Hence 
we conclude that there is a distinct, though not very great tendency towards 
equality in the number of types in the two hands of an individual. It appears, 
however, that the divergence is rather greater in the right than in the left hand. 

The question now arises whether the difference in divergence in the two hands 
for the samples taken is significant. I have tested this by the method proposed by 
Professor Karl Pearson*. 


* «©Qn the Probability that Two Independent Distributions of Frequency are really Samples from the 
same Population,” by Karl Pearson, F.R.S., Biometrika, Vol. vu11, pp. 250—254, July, 1911. 
54—2 
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TABLE 2. 
Divergence of Types in Right and Left Hands. 
Number of Types. 


i 2 3 4 5 Totals 
Right Hand _... 118 842 839 194 vi 2000 
Left Hand wo 174 955 716 151 4 2000 
For this Table 
x? = 33°72, 


whence FP is less than ‘000,005. 


That is, the odds are more than 200,000 to 1 against the occurrence of two such 
divergent samples if they were random samples of the same population. In other 


words the right hand generally tends to have a greater divergence of types than 
the left. 


The following Table gives the distribution of classes of prints for the various 
fingers of both hands: 


TABLE 3. 
Distribution of Classes of Prints. 


A is rie W c 

| Ry = 46 1104. | 1 649 200 
1 lS) 537 456 481 174 
Re a 212 1399 38 274 ai 
Vee se 63 1015 17 729 176 
Re a 31 1631 3 228 107 

‘aoe pe ers - is 

| Totals Mes 704 5686 515 | 2361 734 


L, on 91 1311 3 341 254 
Ly ae sl3e || 732 | 383 437 135 
Ls oe: PAS} 1408 35 240 102 
In aeS 66 1283 | 12 491 148 
Ls Ses 35 1727 — 150 88 
Totals us 720 6461 433 1659 727 


Totals for both hands} 1424 12147 948 4020 1461 


The most striking feature of this Table is the uneven distribution of the 
various classes, especially the large proportion of ulnar loops and the very small 
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number of radial loops except in the forefingers. A comparison of the distribution 
in the two hands shows considerable differences; e.g., in the left thumb the num- 
ber of arches is about double the number in the right; again, the whorls in each 
finger of the right hand are greatly in excess of those on the left, while the left 
hand has, in every case, an excess of ulnar loops. 


If we arrange the numbers of each class in order of magnitude, we see that the 
order for the arches is identical for the two hands and also for the ulnar loops. 
In each of the other classes there is one exception to the “identical” order. 


I have tested these distributions for each type by the method referred to in the 
footnote of p. 4224, with the following results :—In the arches the odds are more 
than 500 to 1 against the occurrence of two such divergent samples which are 
random samples taken from the same population; in the ulnar loops the odds are 
more than 200,000 to 1; in the radial loops about 5 to 2; in the whorls more than 
1,000,000 to 1, and in the composites more than 1300 to 1. 


We may thus fairly conclude that with the exception of the radial loops the 
frequency distribution of the classes between the fingers is different in the two 
hands and the radial loops are so few, except in the forefinger, as to be almost 
negligible. 

6. Subdivision of Loops. The great preponderance in the number of loops 
and the insignificance of the number of radial loops, except in the forefinger, make 
another subdivision of this class necessary. The method adopted is as follows :— 
All loops, in common with whorls and composites, contain certain well-defined 
points; these are (1) the “ delta,” or “outer terminus,” and (2) the “ point of the 
core,” or “inner terminus.” [See Henry, pp. 22—24.] The number of ridges 
mtervening between the delta of a loop and the point of the core may be anything 
from one up to about thirty; in only 38 cases out of the 13,095 loops does the 
number of ridges exceed 25; two of these are over 30, one being 32 and the other 
35. The complete distribution of ridges is given in Table 4 a. 


In dividing the loops into two sub-classes according to the number of ridges 
the nearest approach to equality is obtained by taking (a) those containing from 
1 to 12 ridges, and (6) those containing 13 or more ridges. For brevity I have 
called these classes (a) Small Loops, and (b) Large Loops; the terms “Small” and 
“Large” have no reference to the relative sizes of the patterns. The numbers in 
the two groups, thus arranged, are 7033 and 6062 respectively. 


Table 46 gives (1) the number of loops for each finger, (2) the means, (3) the 
standard deviations, and (4) the coefficients of variation in the numbers of ridges. 


Examining the Table below consider first the means. The order which is 
identical in the two hands runs: 


(1) Thumb, (2) Ring Finger, (8) Little Finger, (4) Middle Finger, (5) Index. 


It will be noticed that this order of the means is quite different from that 
of the relative areas of the patterns. 
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TABLE 4a. 
Distribution of Ridges in Loops. 
Right Hand. Left Hand. 

Ridges | Ft) | (Be -| Bel seeaieleee, NE | len |) es 

1 3 19 18 8 a 20 21 5 10 

2 7 68 47 27 24 79 65 25 32 

3 7 76 50 26 49 Tey 54 40 42 

4 8 71 66 35 68 ga 60 33 56 

7) 24 67 57 58 70 70 52 45 69 

6 14 47 80 36 92 54 47 32 69 

tf 24 45 76 44 77 64 | 75 47 70 

8 29 50 85 45 73 67 70 34 91 

9 25 40 97 36 83 65 77 53 85 

10 43 47 109 55 94 73 117 61 122 

al 53 55 116 48 100 76 124 89 156 

12 53 69 120 71 133 77 125 119 157 

3 65 62 116 66 111 65 141 94 134 

14 89 60 128 76 143 67 139 111 150 

15 68 60 84 79 103 49 99 94 131 

16 91 43 75 73 106 35 66 117 136 

1 86 38 62 60 97 26 47 86 88 

18 84 33 28 65 73 12 37 60 64 

19 100 16 11 34 46 11 15 58 33 

20 61 12 5 34 48 a 6 33 17 

21 49 a 4 18 19 4 22 6 

22 34 2 2 19 3 6 1 12 4 

23 33 4 — 8 4 a — 9 2 

24 22 1 1 4 6 1 1 8 1 

25 ll -- = 2 2 = —~ 2 2 

26 7 _- —_ 1 2 = == 1 = 

27 7 — — 1 1 -— — 1 — 

28 7 — — — -- = = 1 — 

29 1 = = 1 = 

30 ae rig | Seas 1a Sas eee ones 

32 techs | Meee ee | a 

35 as ee ae 

Totals | 1105 | 998 1487 | 1032 | 1634 | 1314 | 1115 | 1443 | 1295 | 1727 
TABLE 40. 
Nee of Means Standard Deviations Coefficients of Variation 
oops 
Ree R iB R is R ij 

Thumb 1105 | 1314 | 15°52++10 | 13-27+-09 | 5°174°07 | 4°634°06 | 34°34+ °53| 34:85+ °51 
Index .. {| 993 | 1115 | 9°69+:12) 8834-10 | 5-41+°:08 | 4°88+ °07 | 55°82+1-08 | 55°24+1-00 
Middle Finger | 1437 | 1443 | 10-41+-08 10°55+-08 | 446+ ‘06 | 4°53+ :06 | 42°80+ ‘63 | 42°914 °63 
Ring Finger... | 1032 | 1295 | 12'374°12 12°774°10 | 548+ ‘08 | 5°09+°07 | 44°31+ °78 | 39°85+ °61 
Little Finger . | 1634 | 1727 | 11°75+-:08 | 11°53 + -07 | 4°97 + ‘06 | 4°46 + -05 42°30+ °58| 38°71+ ‘50 


le 
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Comparing the two hands we see that the differences in the middle, ring and 
little fingers are insignificant ; in the thumb and index, however, there is a marked 
difference in favour of the right hand. 


The order of the standard deviations in the right hand is: 
(1) Ring Finger, (2) Index, (3) Thumb, (4) Little Finger, (5) Middle Finger. 


In the left hand the order of the last two is reversed, but the difference is 
small. 


With the exception of the middle finger, where the difference between the two 
hands is only about equal to the probable error and is therefore insignificant, the 
standard deviation is in every case greater for the right hand than for the left ; 
the differences are all of the same order of magnitude and range from about 


39 to 54, 
Coming now to the coefficients of variation—the order in the right hand is: 
(1) Index, (2) Ring Finger, (3) Middle Finger, (4) Little Finger, (5) Thumb. 
In the left hand the order of the ring and middle fingers is interchanged. 


Comparing the two hands we see that in three cases—the thumb, index, and 
middle finger—the differences are each less than the probable errors; in the other 
two cases the variability is considerably greater in the right hand than in the left. 


I have carefully revised the calculations involved but have been unable to 
detect any error; neither can I suggest a reason for the large differences. 


In “ A First Study of the Variability and Correlation of the Hand” (see p. 421), 
the writers find that the variability of bone lengths is closely related to the 
relative utility of the fingers, the least variability being that of the most useful 
finger. There appears, however, to be no such simple relationship between the 
ridges of the loops and the relative utility of the fingers. 


I have compared the distribution of ridges in the loops of the thumbs by 
Professor Pearson’s method (p. 4224, footnote), which gives y? = 166°64; hence the 
odds are much greater than 1,000,000 to 1 against the occurrence of two such 
divergent samples if they were random samples taken from the same population. 


The distribution—absolute and percentage—of the five groups is now as 
follows (Table 5). 

In comparing the large and small loops it will be seen that in both hands there 
is an excess of large loops in the thumb, ring and little fingers, and an excess 
of small loops in the index and middle fingers. The order of these classes agrees 
in the two hands with one exception in each case. 


An approximate measure of the relationship existing between the various 
combinations of digits is given by the number of cases in which two particular 
digits on the same or on opposite hands have the same pattern. Table 6a gives 
the percentages for the same hand and for digits of the same name on opposite 
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TABLE 5. 


Arches Small Loops | Large Loops Whorls Composites 


No. Oe No. oie No. Sh No. Os No. oe 


Ry me 46 2°30 290 14°50 | 815 40°75 | 649 32°45 200 10°00 

Ry aise 352 17°60 | 654 32°70 339 16°95 481 24°05 174 8°70 

Rs ... | 212 10°60 | 921 46°05 516 25°80 | 274 13°70 77 = 3°85 

Ry eet 638 315 489 24°45 | 543 27-15 729 36°45 176 ~=—880 

Rs nit 31 1°55 870 43°50 | 764 38:20 | 228 11°40 107-5585 
Totals aio 704 3224 2977 | 2361 734 


Es 91 4:55! 547 27°35 | 767 38°35 | 341 17:05 | 254 12°70 
bs 313 15°65 | 833 41°65 | 282 14:10] 437 21°85 | 135 6°75 
: 215 10°75 | 887 44°35 | 556 27:80] 240 12:00] 102 5:10 
fe 66 3°30 | 583 29:15 | 712 35°60 | 491 24:55 | 148 7-40 
Ve 35 1°75 | 959 47:95 | 768 38:40] 150 7-50 88 4:40 
‘ i = - 
Totals = 720 3809 3085 | 1659 727 


Totals for both hands | 1424 7033 6062 4020 1461 


hands; the readings for other combinations of digits on opposite hands are given 
in Table 6d, p. 431, where all the patterns are grouped in three classes for the 
sake of comparison with Galton’s results. 


Remarks on Table 6a. (a) The percentages vary greatly with different 
combinations and with different patterns. 


(b) The means and totals for digits of the same name on opposite hands are 
all much greater than the corresponding readings for the right or for the left 
hand; the means, with one exception, and also the totals for particular combina- 
tions on the left hand are all greater than the corresponding readings for the 
right. 

(c) The order of magnitude of the totals is nearly the same for the two hands, 
those of the combinations including the thumb being, with one exception in each 
hand, the lowest. Hence, judging the relationship by the totals, it appears that 
(1) digits of the same name on opposite hands are the most closely related, the 
magnitude falling in order from the little fingers to the thumbs; (2) omitting 
the thumbs, two consecutive digits are generally more closely related than others 


more widely separated ; (3) the digits of the left hand are more closely related 
than those of the right. 


(d) The relationship between the thumb and any other digit seems to be less 
close than that between any pair of digits not including the thumb; also, in both 
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hands, the thumb appears to be most closely related to the ring finger, then to 
the little finger, next to the middle and least to the fore-finger. 


Another method of investigating the approximate relationship between the 
various digits is by means of a “centesimal” scale, as in Galton’s Finger Prints, 


Ch. Vit. 


Table 66 gives such scale readings for small loops, large loops and 


whorls, for pairs of digits on the same hand and for digits of the same name on 


opposite hands. 


Percentage of Cases in which various pairs of Digits possess the 


TABLE 6a. 


same Class of Pattern. 


I have not considered it necessary to include other couplets 


Right Hand Left Hand 
Couplet Totals Totals 
A SL | LL W Cc SL | LL W Cc 
Thuinb and fore-finger 15] 63) 775) 13°0} 1:4) 29°7 J 2°4/15°7|) 7:2} 8:3) 1:3) 34:9 
* middle finger [1:3] 8°7|11:2| 80| -7| 29°9 }1°6/ 16-2) 13-4] 4-9] 1-4] 37-5 
+ ring 5 71) 6:5 | 12:4) 17-8) 1:1) 38°5 71132) 17-1) 7:9} 1-5] 40-4 
is little . ‘4| 99/150] 6-9] -9| 33:1 } -7/| 19:0) 15-7] 2:7] -9| 39-0 
Fore-finger and middle finger | 6°4 | 22°7| 7:2) 8:°9|1:2| 46:4 [5°8/26°3| 7:°9| 86/1:2) 49°8 
5 ring ,, 2°3/13°7| 6°4/16:3| :9| 39°6 }2°5)17°8| 7:1) 12°7|)1°5| 41°6 
‘ neler 1:2/ 20:1] 9:1] 6-8] -9| 38:1 [1:3] 26-3! 8-2] 4:3] -3] 40-4 
Middle and ring finger 2°5| 16°8| 91)12:0) 7 | 41°1 §2°6 | 21:1] 15:4) 8:8) -8) 48°7 
- little ,, 1:0 | 26°8| 14:0} 4:6) :4] 46°8 [1-2 | 28°9) 15°3| 2:9] -6| 48-9 
Ring and little os *8 | 20:0} 15-7} 10°0 | 1:0 | 47°5 ‘9 | 23°7|19°5 | 6:0} °8]| 50°9 
Means 1°8 | 15-2 | 10°8 | 10-4 9} 39:1 2-0} 20°8| 12°7| 6°7/1°0 | 43°2 
| | 
Couplet A Sia |) oda) He C | Totals 
Two thumbs 3 1°56 | 10°2 | 23°4 | 13°5 | 2°7] 51°3 
» fore-fingers ... 9°3 | 22°7 | 5°6.| 14:4] 1°74) 53:4 | 

, middle fingers 5°8|31°8|14°8| 7:0] 9} 60°3 

» ring - 1°9 | 18°2 | 18°4 | 21:2 | 1:5} 61°2 

Seilittle, =": 9 | 36:1 | 27-2) 5:0| 1:4) 70°6 

Means... 3°9 | 23°8 | 17°9 | 12°2 | 1°6 | 59°3 


from opposite hands, because, as is shown later, the relationship between any 
pair of digits from opposite hands is practically the same as between the corre- 
sponding pair on the same hand. I have also omitted arches and composites from 
this part of the inquiry as the numbers belonging to these classes are, as a rule, 


comparatively small. 
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The scale reading for any pair of digits is calculated as follows :— 


Take, for example, the whorls on the right thumb and right fore-finger; the 
former has 32°5 and the latter 24 per cent. of whorls, while 13 per cent. of right 
hands have whorls on both thumb and fore-finger. Now from independent pro- 
bability we shall expect BEES x 100, or 7°8 per cent. of “double whorls” in 
this combination of digits and we therefore conclude that the remaining 5°2 per 
cent. of double whorls are due to a relationship between the digits. If we set 
aside the 7°8 per cent. out of the 32°5 and 24, we see that from the remaining 
24-7 and 16:2 per cent., the greatest possible percentage of double whorls would 
be 16:2; but as the actual percentage in addition to the 7°8 is 5:2, the centesimal 
5-2 x 100 


measure of the relationship is Sr 


, or 32, to the nearest unit. 


TABLE 60. 


Approwimate Measures of Relationship between various pairs of Digits 


on a Centestmal Scale. 


| Right Hand Left Hand 
Couplet ] 
SL | LL | W |Means} SL | LL | W_ | Means 
Thumb and fore-finger ... 16 6 | 32 18 27 21 34 27 
5 middle finger... 26 0 38 21 26 16 28 23 
Re ring 5 27 8 29 21 27 16 29 24 
little i 44 Oo aioe Ooms mn 4 | Jo30n sos 
Fore and middle fingers... 44 22 54 40 34 39 64 46 
* ring os Hh 30 15 49 33 33 23 44 33 
‘ little a Bae 32 25 47 35 29 32 46 36 
Middle and ring se, os 42 11 80 44 50 31 64 48 
y, little ,, a 29 26 31 29 33 27 30 30 
Ring and little . ae 67 32 81 60 64 26 74 55 
Couplet SL LL W | Means| 
Two thumbs me 59) | 34 | 69) || 54 
» fore-fingers ... 48 27 55 43 
» middle fingers... 47 41 52 AT 
» ring ms 64 50 78 64 
», little 53 67 53 62 61 


Most of the remarks on Table 6a will be found applicable to Table 6 6, with, 
at most, but slight modification; the chief differences are that the relationship 
between the middle and little fingers is not so high in Table 6 0 as in Table 6a, 
and the order for pairs of like digits is not the same in the two Tables. 
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Comparison of results with those of Galton. In order to compare with Galton’s 
results it is necessary to put large and small loops into one class and to include 
composites with whorls. Making some allowance for the difference of classification, 
and for any slight variation which may be due to the fact that the material is 
drawn from very different classes of the population, it will be found that there 
is almost perfect agreement between our data on all essential points. 


The relative frequency found in the two investigations was :— 


Galton Waite 
Arches 65 per cent. 71 per cent. 
Loops CGO GBA A 
Whorls PABA QA 


The differences are small in comparison with some found by Galton when 
examining the finger-prints of different races. For example, 1332 Hebrew 
children had arches on the right fore-finger in 13°6 per cent. of the cases, while 
only 7°9 per cent. of 250 English children had arches on that finger. 


TABLE 6c. 
Percentage Frequency of Arches, Loops and Whorls on the different Digits. 
Gatton* | WAItE 
From observations of the 5000 From observations of 20000 digits of 
digits of 500 persons 2000 persons 
Arches Loops Whorls Arches Loops Whorls 
Digit FARR Eile reali | | ote Wad | Oh RON 
| 
Fore-finger... | 17 | 17 53 53 30 pa i ad Ue 49°7| 55°7| 32°7| 28°6 
Middle finger | 7| 8] 78] 76] 15] 16] 106 | 10°7 | 71°9) 72:2] 17°5| 17-1 
Little se 1 2 86 90 13 8 1°5 a7 81°7| 86°4] 16°38) 11°9 
Thumb 3 5 53 65 44 30 2°3 4°5 55°2 | 65°7) 42°5 | 29°8 
Ring finger... 2 3 53 66 45 31 3°2 3°3 51°6 | 64°7| 45°2} 32-0 
Totals SOs EsoMo2eooON 4 yell lise Sbs2el 8o:9) | SLO) 344275) 15479) 119° 
ie = | 


Galton arranged the digits as in Table 6c, in order to bring out certain 
peculiarities. He says :— 


“The digits are seen to fall into two well-marked groups ; the one including the fore, middle, 
and little fingers, the other including the thumb and ring finger. As regards the first group, the 
frequency with which any pattern occurs in any named digit is statistically the same, whether 


* From Finger Prints, p. 116, Table II. 


430 Association of Finger-Prints 


that digit be on the right or on the left hand; as regards the second group, the frequency differs 
greatly in the two hands. But though in the first group the two fore-fingers, the two middle, 
and the two little fingers of the right hand are severally circumstanced alike in the frequency 
with which their various patterns occur, the difference between the frequency of the patterns 
on a fore, a middle, and a little finger, respectively, is very great. 


“Tn the second group, though the thumbs on opposite hands do not resemble each other in 
the statistical frequency of the A. L. W. patterns, nor do the ring fingers, there is a great 
resemblance between the respective frequencies in the thumbs and ring fingers ; for instance, 
the whorls on either of these fingers on the left hand are only two-thirds as common as those 
on the right. The figures in each line and in each column are consistent throughout in 
expressing these curious differences, which must therefore be accepted as facts, and not as 
statistical accidents, whatever may be their explanation.” (Galton, Finger Prints, p. 116.) 


These remarks apply with equal force to my figures although the actual 
percentages differ somewhat in certain cases, the most marked being in the 
middle finger arches and the little finger whorls. 


The following points of agreement in the distribution of the patterns are also 
noticed by reference to Table 6 c. 


The frequency of arches on the fore-fingers is much greater than on any other 
of the four digits. “It amounts to 17 per cent. on the fore-fingers, while on the 
thumbs and on the remaining fingers the frequency diminishes in a ratio that 
roughly accords with the distance of each digit from the fore-finger. 


“The frequency of Loops has two maxima; the principal one is on the little 
finger, the secondary on the middle finger. 


“ Whorls are most common on the thumb and the ring-finger, most rare on the 
middle and little fingers.” (Finger Prints, p. 117.) 


In discussing radial and ulnar loops, which Galton describes as loops having 
“inner” and “outer” slopes, respectively, he says :— 


“Tn all digits except the fore-fingers, the inner slope is much the more rare of 
the two; but in the fore-fingers the inner slope appears two-thirds as frequently 
as the outer slope. Out of the percentage of 53 loops of the one or other kind on 
the right fore-finger, 21 of them have an inner and 32 an outer slope; out of the 
percentage of 55 loops on the left fore-finger, 21 have inner and 34 have outer 
slopes. These subdivisions 21-21 and 32-34 corroborate the strong statistical 
similarity that was observed to exist between the frequency of the several patterns 
on the right and left fore-fingers; a condition which was also found to characterise 
the middle and little fingers.” (Finger Prints, p. 118.) 


These statements are true, in general, of my Table 8, but my percentages on 
the right fore-finger are 22°8 radial and 26-9 ulnar; on the left they are 19°2 and 
36°6 respectively. 


Close agreement is also observed in Table 6d which shows the tendency of 
digits to resemble one another in their various combinations. Galton omits 
combinations into which the little finger enters “because the overwhelming 
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frequency of loops in the little fingers would make the results of comparatively 
little interest, while their insertion would greatly increase the size of the table.” 
(Finger Prints, p. 119.) I have included them, however, for the sake of com- 
parison and completeness. 


My percentages are readily obtained from Tables LVI to C in the Appendix. 
TABLE 6d. 


Percentage of Cases in which the same Class of Pattern occurs in various Couplets of Digits. 


GaLTON* WaItE 
Arches in Loops in Whorls in Arches in Loops in Whorls in 
Couplet 

Same | Opposite | Same | Opposite | Same | Opposite] Same| Opposite Same Opposite Same Opposite 
hand | hand hand | hand | hand | hand fhand| hand |hand| hand |hand| hand 
Two thumbs — 2 — 48 — 24 — 1°6 = 47 °4 = 24°5 
,, fore-fingers = 9 — 38 — 20 — 9°3 — 36°2 = 20°4 
» middle fingers — 3 — 65 -- 9 — 5°8 — 60°6 — 10°5 
» ring + re 2 — 46 = | ae _ 1g, — 46°3 — 27°9 

» little Sac || — ~- — — — —_— ‘9 = 63°2 — 63 | 
Thumb and fore-finger | 2 2 35 33 16 15 1SOi) WeSh 36:8 Shey i828 1745 
Ps mid-finger 1 1 48 47 9 8 1-4 15 47°0,; 46°7 |10°9} 10°5 
x ring finger 1 1 40 38 20 18 af 6 41:0} 39°4 | 20°8| 19°0 
Fore and mid-finger . 5 5 48 46 12 a 6:1 5°5 ANID} || alos | ileats) || 12383 
H a ring finger ... 2 2 35 35 ily 17 2°4 2°3 36°5 | 35°7 | 20°8} 20°2 
Middle and ring finger} 2 2 50 50 13 12 2°5 2°4 48°3) 47:1 | 14:7} 13:7 
Thumb and little finger | — -- — —- | = oo "52 "45 |54:2|) 53°6 8°8 81 
Fore and little finger... | — — eel eae —_— 1-20} 1:15 |47°7) 47:2 9°3 8:1 
Middle and little finger el | = il 1-0 63:9 | 63-5 6°5 6'1 
Ring and little finger... | — — — — | — — 8 7 56") 54:8 | 12°9')) “11-8 

|e 
In commenting on his results in Table 6d, Galton says:—*“ The agreement 


in the above entries is so curiously close as to have excited grave suspicion that 
it was due to some absurd blunder, by which the same figures were made in- 
advertently to do duty twice over, but subsequent checking disclosed no error. 
Though the unanimity of the results is wonderful, they are fairly arrived at, and 
leave no doubt that the relationship of any one particular digit, whether thumb, 
fore, middle, ring or little finger, to any other particular digit, is the same, whether 
the two digits are on the same or on opposite hands.” 


It will be noticed, however, that while exactly half of Galton’s eighteen pairs 
of percentages, which are worked to the nearest unit only, are in strict agreement, 
in all the other cases the result is one or two units less for two digits on opposite 
hands than for the corresponding digits on the same hand. In my figures the 
percentage for two digits on opposite hands is in every case the lower, and 


* Finger Prints, p. 120, Tables VIa and YI Db. 
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although the differences are small, ranging only up to 1:8 while four-fifths of 
them are less than 1, the consistency of the results suggests a slightly closer 
relationship between a pair of digits on the same hand than between the corre- 
sponding pair on opposite hands. This view is further supported at a later stage 


of this paper. (See Remark (d) on Tables 14-16, p. 450.) 


One further comparison is of interest, namely, the measure of relationship 
between the various digits on a centesimal scale. It should be noted, however, 
that while Galton’s means are based on loops and whorls only, omitting arches 
from his three groups, mine are based on small loops, large loops and whorls, 
omitting arches and composites from my five groups; also Galton gives no results 
for those combinations which include the little finger. 


TABLE 6e 


Approximate Measures of Relationship between the various Digits, 
on a Centesimal Scale. 


Gauton* WatrtE 
| Couplet 
Means Right Left 
Thumb and fore-finger ... 24. 18 27 
i middle finger... | 27 21. 6 \e Tosa 
a ring finger... 39 21 24 | 
Fore and middle finger pac 60 40 46 
» ring finger... ne 23 33 33 
Middle and ring finger aa 52 44 48 
Right and left thumbs Soh 61 54 
se fore-fingers ... 48 43 
i middle fingers 43 47 
aH ring fingers ... 65 64 


For the reasons given above we could hardly expect that these readings would 
be even approximately equal, but for all that, the same general relations are seen 
to hold good in the two sets of results. 


It is convenient at this stage to summarize a few of the most important 
points which have been brought to light in the foregoing pages. These are: 


(a) <A greater divergence of types in the right hand than in the left. 
(b) A clustering of the same type in the hands of an individual. 
(c) The uneven distribution of the various types in the different fingers, 


especially the almost entire absence of ulnar loops except in the index. 


* Finger Prints, p. 129, Table VIII. 
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(d) The ditferentiation of types in the two hands, in particular the large excess 
of whorls in the right hand and of arches in the left thumb. 


(e) Where there is any significant difference in the means, standard deviations 
and coefficients of variation in the numbers of ridges in the loops of the two hands 
those quantities are always greater for the right hand than for the left. 


(f) The relationship between digits of the same name on opposite hands is 
closer than that between any others; the digits of the left hand are more closely 
related than those of the right; and two consecutive digits, whether on the same 
or on opposite hands, are generally more closely related than others which are more 
widely separated. The relationship between the thumb and any other digit is less 
close than that of any pair not including the thumb. 


We may thus conclude that the left hand in its distribution of patterns is 
differentiated from the right and that individual fingers are associated in a differ- 
ential way with special types. We know that the right hand is differentiated from 
the left in use, and it would seem reasonable to suppose, even if we cannot account 
for the adaptation to use, that the finger-prints have been differentiated in accord- 
ance with this use differentiation. 


It may be suggested that the finger-prints if differentiated in accordance with 
diversity of use of the several fingers and of each hand follow a law of differ- 
entiated utility and not as the bones a law of maximum general utility of the 
finger. 


7. Correlation between the Classes of Finger-Prints. The object of this section 
is to obtain the associations between the various classes of prints and on the basis 
of these associations to enquire whether any Natural Order exists in which a 
certain degree of continuity may be assumed. For a complete investigation of this 
problem fifty-five Tables are necessary. They are: 


(a) 10 Tables of Classes for the Right Hand. 


(b) 10 ” ” 3 Left Hand. 
(8) PA, “ » Right against the Left Hand. 
GP 10; 55 both Hands together. 


These Tables are given in the Appendix, pp. 453 et seq. 


The correlation coefficients and the contingencies have been calculated for the 
whole of these Tables. For all the restricted Tables, I to XX, and XLVI to LV, 
and in certain of the remaining Tables where the results of the other two methods 
are widely divergent, the correlation ratio has also been found. In these cases 
I have obtained 7 in both directions, the values of 7 given in Tables 8 and 9 being 
the square root of the product of the two 7’s for each Table *. 


* The arithmetic instead of the geometric mean might have been taken, and there would not have 
been very marked differences. But the geometric mean has the advantage of a symmetrical value, i.e. 


Re Oe see) Ie 
i Ox Fy 
which has certain analogies with a coefficient of correlation. 
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8a. Method of Calculating the Coefficients of Contingency in Restricted Tables. 
It will be noticed that the Tables I to XX and XLVI to LV, given in the 
Appendix, differ in general character from most correlation tables since the whole 
of the cells in the lower right-hand portion are necessarily empty. Consequently 
the usual method of finding the independent probability numbers for the purpose 
of calculating contingencies is not applicable to those Tables. The method which 
has been employed was suggested to me by Professor Pearson. It is as follows: 

Consider Table VI, Appendix, p. 454, which gives the distribution of small 
loops and whorls for the right hand. Commencing with the 45 hands which con- 
contain 5 small loops each, it will be seen that the independent probability 
number is the same as the observed, since a hand which has five prints of one 
class can have no other. In the next column the distribution of the 211 prints by 
independent probability is not in the ratio of 861 to 497 since 45 of the 861 have 
already been disposed of, but in the ratio of 816 to 497, that is, the numbers in 
the two rows are 1311 and 79:9. Again in the third column from the right con- 
taining 3 small loops, 45 and 131-1 of the first row are accounted for and 79°9 of 
the second row; hence the independent distribution of the 306 in the third 
column is in the proportion of 6849 : 417-1 : 292; that is, the numbers are 
150°3, 91:6, and 641; and so on. 

It should be noted that the same independent probability numbers are obtained 
if we commence at the bottom of the first column with the 50 hands each con- 
taining 5 whorls and work horizontally instead of vertically. 

The differences between these independent probability numbers and the 
observed numbers are then used to find the contingency in the same way as in 
the ordinary contingency Table. 


No correction for the number of cells has been applied to the contingency 
coefficients in this type of Table as we have, at present, no appreciation of what it 
should be. 


The complete contingency Table, worked as described, is given below. 


Note on Calculation of Contingency Coefficients. It should be borne in mind 
that in finding the independent probability numbers in all contingency tables as 
well as in calculating the standard deviations, it is assumed that the distribution 
of the marginal totals is in the same ratio as would be the case if the whole 
population were taken ; in other words, that if n, is the total of an array when a 
sample NV is taken and m, the total of the corresponding array when the whole 
population M is taken, then it is assumed that 


Ne sie 


Evidently the correct value of the independent probability number in the (s, s’) 
cell of an ordinary contingency table would be 
N 


/ 
MO eS OED. a 
‘ <M /N 
WV or MUN gt WE 


W horls, R. 
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and the contributory contingency 
7 kee 
Nyy — Ms M's He 
rey, , 
MsM gy We 
TABLE 7. 
Contingency Table. 
Small Loops, R. 
0 i 2 3 4 5 Totals 
= eee 
Gee 78 144 204 211 179 45 861 | 

B 200°6 167°4 166°6 150°3 131°] 45 
— 122°6 —93°4 37°4 60°7 47°9 — 
y?/3 74:93 3°27 8°40 24°51 17°50 —- 

1 106 }lkiay33 126 80 32 — 497 
122-2 101°9 101°4 91°6 79°9 — 
—-16°2 | 51-1 24°6 —11°6 -—47°9 — 
2°15 | 25°63 5°97 1°47 28°72 — 

2 130 92 55 15 -— — 292 
85°5 lca: 71:0 64:1 — — 
44°5 20°6 -—16°0 —49°1 — — 
23°16 5°94 3°61 37°61 — _: 

3 125 38 7 — — — 170 
63°8 53°2 53°0 — — — 
61:2 —15°2 — 46:0 — — — 
58°70 | 4°34 39°92 =. _: — 

y 104 3, 26 = = 2s — 130 
70°9 59:1 — — _ — 
Bio yall —33°1 — — aoe — 
15°45 18°54 = — mee = 

5 50 — — — — _- 50 
50 | — — — _ — 

Totals 593 453 392 306 211 45 2000 

“19991 
x2=S (y2/8)=399'82, ¢=72/n='19991, = mE = 4082. 


one oe N 
Similarly the quantities m, MW’ 


i AN 
eS ye 


7 


119991 


etc. would be the correct marginal totals 


to use in finding the independent probability numbers for the restricted contingency 
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tables and in obtaining the standard deviations, instead of the observed totals 
Ns, Ny, ete. 


However, as we do not generally know M, m,, m’y, etc., we are obliged to use 
the observed marginal totals as the nearest approximation we can get to the 


J : IY ae : 
correct values, although n, is not, in general, equal to m,; Vk A similar assumption 


is of course always made in the formulae for the probable errors of samples, where 
the sample value is put ultimately for the population value. 


8b. Correlation Ratio of Restricted Tables. It is obvious that the ordinary 
method of calculating the correlation ratio also requires modification with Tables 
of this type; for this method is based on the differences between the means of 
the marginal totals and the means of the arrays. Now, in restricted Tables it 
would be impossible for the means of about half the arrays to approximate to the 
means of the marginal totals and it would be fallacious to base any conclusion on 
the deviations of the observed means from impossible values. 


A nearer approximation would be to take the pseudo » from the formula 


y= S {ne Ya a¥i)} 

Nao,? : 
where qj; is the mean of an array of the independent probability numbers; but 
the denominator of this formula must be modified in such a way that in a case 
of perfect association, 7 = unity. The desired result is obtained if we put >? 
instead of o,?, where 


>? = SS (y = ali) 


N 
We may write 
xs SS(Y= Tat a ~ a 
- N 
= SS (y ge Ya) ak S (Re (Yau me ai)’ a 28S (y ae Ya) Ya x a¥i) 
* N N N 
S Cs Gia) S {Ny (Ya = ai)? 
Fu. ain ie toes oe Nie eae 
since the third term vanishes ; hence 
2 SiMe (Go ai) YN 
Jas (Nz Fq7)/N +8 {nz Ya- ail} /N 
Stn Oa) 
But DAAC) | eee 
u No,? 1-7, 
and S [Ne (Ya = adi)” 


Novae eam 


where , is the crude 7 found by the ordinary method. 
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We have, therefore, the value of the correlation ratio of restricted Tables 


given by 
7? Sey ee Ny a 
1— 9." + 9)" ; 
or ae 


7 a 
V1 — ei + Hp" 


The correlation ratio has been found by the method described above for all the 
restricted Tables; it has also been determined by the ordinary method for a few 
of the other Tables, but no correction for number of arrays has been applied. 
The results, together with the coefficients of correlation and of contingency, 
are given in Tables 8 and 9. 


Regression curves for all the restricted Tables are given on Plates (a—e). The 
continuous line is the independent probability curve and the broken line the 
curve of the observed means. It follows that the area between the curves, 
weighted, of course, with the marginal totals, gives a measure of the correlation 
ratio between the two characters. 


Each set of three figures for two particular characters, namely, those for the 
right hand, left hand, and both hands respectively, will generally be found to 
resemble each other closely. Irregularities occur chiefly with composites but this 
is not surprising if we consider the nature of this class. 


Sc. Coefficients of Correlation of Restricted Tables. A glance at the diagrams 
of means of the restricted Tables, Plates (a—e), shows that the regression is 
generally non-linear; it is also evident that a sensible value of r is introduced by 
the restriction*. Hence the value of 7 as found by the ordinary product-moment 
method is (1) too small because of the skewness of regression and (ii) too Jarge on 
account of the restriction. These two contrary causes render the coefficient of 
correlation of restricted Tables unreliable and therefore quite valueless ; for even 
if it sometimes agrees fairly closely with the correlation ratio and the contingency 
coefficient, this agreement is probably due to the fact that the two sources of 
error counterbalance each other. 


In the remaining Tables, for which the results are given in Table 9, the 
regression is frequently skew; for this reason and for those given above, I have 
rejected the values of the coefficient of correlation in the sequence and have based 
my conclusions on the contingency coefficients, confirmed in general by the corre- 
lation ratio. 

* For example, in small loops and large loops, left, the case in which the difference between r and ¢ 
is the greatest, the independent probability numbers have the correlation coefficient — 512 (instead of 
the theoretical value zero), as compared with —-+507 of the observed numbers. In the case of arches 
and small loops, both hands, 7 for the independent probability numbers is — ‘148, as against + °147 of 
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TABLE 8. 
Table in 
y yee | yalp | yx) | ayTe | xyIp | xy? 0 Cc Appendix 
Arches and Small Loops 062 | (192 | :242 | -227 | °228 | -266 | -264 | -251 | -305 I 
5 Large Loops 273 | °273 | °194 | °198 | °298 | °215 | *220 | -209 | -246 IT 
Z % Whorls 317 | °359 | *283 | °290 | °353 | *286 | °292 | 291 | 335 III 
3 Composites : 146 | °154 | °139 | 139 | °157 | °140 | -140 | +140 | 154 IV 
| Small "Loops and Large Loops 422 | 438] ‘074 | -082 | *430 | °073 | 080 | 081 | -166 Vv 
a=) $6 Whorls 585 | 622 | °313] °371 | 574 | 315 | °385 | °378 | °408 VI 
a0 Composites 270 | °273 | *207 | *210 | *273 | *197 | 201 | -205 | *228 VII 
| Large Loops and Whorls -234 | -239 | 079 | 081 | °305 | 112 | 116 | -097} -162| VIII 
Composites 093 | °138 | :093 | °093 | °104 | 045 | 045 | 065 | °128 IX 
Whorls and Composites 020 | °162 | °117 | °126 | (032 | ‘061 | °061 | -088 | 137 xX 
Arches and Small Loops 038 | 197 | °271 | +267 | *249 | *301 | -292 | +281 | °335 XI 
a Large Loops *333 | 344 | -253 | -261 | °375 | -293 | -302 | -280 | °309 XII 
Whorls "255 | -286| °232 | -235 | *281 | *232 | -235 | +235 | -274 XIII 
'e ‘ Composites 154 | +159] +144 | 7145 | 164 | -147 | -148| -146 | 162) XIV 
s Small Loops and Large Loops |— *507 | 534 | ‘095 | °112 | °510 | °028 | 025 | -055 | *120 XV 
os ¥ Whorls *565 | 623 | °364 | *422 | °570 | °310 | °353 | °386 | -419 XVI 
o Ne 365 | +392] -298 | -309 | 401 | 281 | 293] 301) 311}; XVII 
4 | Large Loops and Whorls 160 | °193) °129 | -131 | *260 | °157 | -160 | °145 | -236| XVIIT 
Composites 080 | -131 | -063 | -063 | ‘088 | ‘007 | 007 | 021 | -108 XIX 
Whorls and Composites 115 | +208 | °217 | °217 | 173 | *202 | -201 | -209 | °244 XX 
Arches and Small Loops "147 | -340 | -387 | -381 | °279 | °358 | 350 | °365 | -440 | XLVI 
B 6) Large Loops 364 | 375 | -298 | 306 | 482 | °364 | °384 | 343 | 383} XLVIT 
f= ‘ Whorls 319 | -409| +355 | °363 | 397 | °341 | 348 | 355 | 402 | XLVIIT 
3 Composites 203 | -227 | 220 | 220 | 226 | -212 | -213 | -216 | 239} XLIX 
& | Small "Loops and Large Loops A471 | °527| 162 | *187 | 476 | -059 | -O71 | *115 | °234 L 
ee ,, Whorls ‘638 | *707 | +421 | 511 | °670 | -412 | -478 | -495 | -503 LI 
a Composites 382 | 393 | °331 | °339 | -402 | 333 | °341 | *340 | °365 LII 
Large Loops and Whorls 147 | "194 | °178 | °178 | °323 | °228 | -234 | -204 | 333 LIII 
Composites 020 | 109} -085 | 085 | :087 | ‘066 | ‘066 | °075 | *181 LIV 
Whorls and Composites 150 | 280} -295 | *294 | °195 | -285 | +233 | *260 | 320 LV 


Remarks on Table 8. A comparison of the Correlation Ratio with the Con- 
tingency Coefficient of the Restricted Tables. 


(a) The values of 7 and C are generally in very close agreement. 


(6) The value obtained for 7 is, however, always less than that for C. 


(c) In only three cases does the difference between y and C exceed 011. 
probable error of 7 ranges from ‘015 for the smallest values to 011 for the largest ; 
it will also be remembered that no corrections have been applied to 7 nor to C, 
since we do not yet know what these corrections should be for restricted Tables. 
We may assume, however, that, as with ordinary Tables, correction would modify 
m less than it would diminish C, and the corrected values of 7 and C would thus, 
in all probability, agree somewhat more closely than at present. 


Biometrika x 


The 


57 


444 Association of Finger-Prints 


TABLE 9. 
Right and Left Hands. 
Table in 
Oe Ct 1 Appendix 
Arches 2 and Arches Z ... ... | +°686 + 008 664 688 = XxI 
ie Small Loops Z ... | +°160+°015 "285 302 | -2344+-014| XXII 
a Large Loops Z... | —"297+ 014 "322 337 — XXIII 
a Whorls ZZ... .... | —°257+°014 ‘290 | -307 -- XXIV 
Composites Z  ... | —*140+°015 “118 ‘161 -- XXV 
Small Loops R and Arches L ... | +°185+-015 *309 325 | -2834+:014| XXVI 
A Small Loops Z| +:711+ 007 631 635 2 XXVII 
ss Large Loops Z| — °378+°013 382 “393 = XXVIII 
by Whorls Z ... | —'494+ ‘011 “499 506 = XXIX 
Composites Z | — -290+-014 292 | -309 aoe XXX 
Large Loops Rand Arches L ...| —'275+:014 297 314 = XXXI 
i Small Loops L | —-217+-014 | -262 | +282 ee 0.8 Gk 
" Large Loops Z| + °550+:011 519 525 — XXXIII 
2 Whorls Z . | —*123+°015 ‘210 235 | 159+ 015 | XXXIV 
Composites ZL | —-017+°015 ‘000 ‘089 — XXXV 
Whorls # and Arches L ... ... | — 3808+ 014 BB B51 = XXXVI 
5 Small Loops Z — 555 + °010 534 “540 = XXXVII 
5) Large Loops Z +°021+°015 283. ‘301 | 170+ °015 |XX XVIII 
x Whorls Z . +°741+:007 ‘670 672 = XXXIX 
Composites ib +°280+ °014 296 313 = XL 
Composites Rand Arches L ... | —°146+ °015 15 |, 59 = XLI 
Small Loops L| —:188 +°014 72 203 = XLII 
x Large Loops Z|} + 131 +:°015 25) ‘166 a XLII 
Whorls £ . | +°059+°015 PLA 168 | -105+:015| XLIV 
- Composites Z... | +°250+°014 367 379 =— XLV 


Further Remarks on Tables 8 and 9. The results given in these Tables show:— 


(a) A general agreement between the correlations for the same pair of classes 
of prints whether obtained by different methods from the same Table (omitting 
values of r in Table 8), or from different Tables, the principal exceptions being 
those for which the correlation ratio has been calculated in Table 9. 


(b) A wide range in the magnitude of the results for different pairs of prints. 


(c) ‘The association between any class of print in one hand and the same class 
in the other is, in general, as might be expected, much higher than any other 
association of these Tables. Omitting the composites the remaining four con- 
tingency coefficients between the same class in different hands are, with one 
exception, each greater than any others; the same may be said of the correlation 
coefficients, the exception in each case being the correlation between whorls in 
the right and small loops in the left hand, which is slightly greater than the 
correlation between the large loops in the right and left. Even with the com- 
posites the contingency for the two hands is greater than that for composites with 

* Values of contingency coefficients corrected for number of cells. 

+ Values of contingency coefficients not corrected for number of cells, given for the sake of comparison 


with other Tables. 
+ The value of 7 is in all cases Nyx Ney 
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any other class found from any of the Tables, while the correlation coefficients have 
five exceptions to this general rule. 


(d) The contingency coefficients given in Table 8, where the two hands are 
taken together, are, with two exceptions, greater than the corresponding coefficients 
in other parts of Tables 8 and 9. The exceptions are (1) the contingency co- 
efficient ‘234 for small loops with large loops of Table 8 is slightly less than those 
in Table 9; and (2) the coefficient 503 for small loops with whorls in Table 8 is 
rather less than that for whorls (right) with small loops (left) of Table 9. 


A further study of the above Tables shows that :— 


Large loops are closest to arches. 


Arches * r whorls. 

Whorls FS FA small loops. 

Small loops ": a whorls and then to arches. 
Composites is » small loops and then to arches. 


The suggestion thus arises that arches and whorls have the closest natural 
resemblance to intermediate sized loops, and also that the “natural order” of the 
classes of finger-prints is :— 


(1) Large Loops, (2) Arches, (3) Whorls, (4) Small Loops, (5) Composites. 


This is more clearly seen from the following arrangement of the contingency 
coefficients. 


TABLE 10. 
Contingency Coefficients of Right Hand. 
Large | arches | Whorls | S™2ll Composites 
Loops Loops 
Large Loops _... 1 246 162 166* 128 
Arches ... aes "246 1 °335 “305 154* 
Warorlsine.. as 162 "335 1 408 137 
Small Loops _... 166 "305 408 1 228 
Composites ie 128 154 137 228 1 
TABLE 11. 
Contingency Coefficients of Left Hand. 
ieee Arches | Whorls ae Composites 
Large Loops... 1 309 | +236 120 103 
Arches ... ee “309 1 | °274 tooo "162 
Whorls ... Bee "236 214 et “419 244 
Small Loops... *120 335 =| 419 1 “311 
Composites... 103 162 | 244 ‘B11 1 
L te 


* Coefficients which do not agree with the proposed ‘‘ natural order. 
57—2 
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TABLE 12a. 


Contingency Coefficients of Right Hand with Left. 
Right Hand. 


; Te Arches | Whorls tae Composites 
vz 
a 
tr | Large Loops _... 519 "322 283 *382* *125* 
eS) SArchesayen: Ai °297 664 337 309 “115 
3 Whorls ... og 210 “290 670 “499 127 
1) Small Loops... "262* "285 534 631 172 
Composites wa “000 ‘118 *296* | :292 367 
(Corrected for number of Cells.) 
TABLE 120. 
Contingency Coefficients of Right Hand with Left. 
Right Hand. 
os rae Arches | Whorls Tae Composites 
ae 
eB 
hi | Large Loops _... 525 337 301 *393* 166* 
is Arches ... aoe 314 688 “351 325 159 
EN Wo WRI eae ac "235 307 “672 ‘506 "168 
I | Small Loops... "282% | -302 540 "635 2038 
Composites ao 089 161 "313* | 309 — 379 
L$ $$ —$__ J} 


(Not .corrected for number of Cells.) 


TABLE 13. 
Contingency Coefficients of both Hands taken together. 


rae Arches | Whorls nee Composites 
Large Loops _... 1 “383 333 234 181 
_ Arches ... hee 383 1 "402 *440* 239 
Whorls ... ie 333 "402 1 503 *320 
Small Loops... "234 “440 503 1 . 361 
Composites aa 181 | 289 *320 361 1 


The contingency coefficients of the right hand with the left have been given 
both corrected and uncorrected for the number of cells and both sets of results 
point to the same conclusion. 


* Coefficients which do not agree with the proposed ‘‘ natural order.” 
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The proposed “ natural order” of the types is supported by the above Tables, 
only eight coefficients out of the fifty-five not being in complete agreement. In 
four of these cases the difference is very small, most likely well within the probable 
errors, and they may therefore be regarded as insignificant. 


A similar arrangement of the correlation coefficients still further supports the 
proposed order, though not quite so conclusively, probably on account of spurious 
correlations. 


9. Association between the various Fingers. In this section I have calculated 
the contingency coefficients only, the classes being arranged in the order found in 
Section 8, p. 445. 


It would, of course, be possible to obtain Tables with much finer grouping 
either by further subdivision of the loops or by making use of the “secondary 
classification” described by Galton or Henry (see footnote, p. 421). All such finer 
grouping would raise the contingency; the extra labour involved by the addition 
of some three or four rows and columns to each Table would, however, be so con- 
siderable that the question arises whether some allowance can be made for the 
coarser grouping employed. ‘This can only be done if we may suppose a “natural 
order” of some kind with a frequency roughly approaching the normal. This 
gives a rough upper limit to the contingency and is the purport of the work in 
the earlier sections on “natural order” and corrections. 


As an example of the effect which finer grouping has on contingency I have 
found the contingency between the index fingers of the two hands by means of 
a “seven by seven” Table, the radial and ulnar loops being separated, and also by 
means of a “five by five” Table in which no distinction is drawn between the 
radial and ulnar loops. The results in this case, not corrected for grouping, are 
653 and ‘626; when corrected for grouping these results become ‘704 and °698, 
respectively. ‘They are so nearly identical as to suggest that no very material 
advantage would be gained by a further subdivision of classes. 


On the assumption that there is a certain degree of continuity in the distri- 
bution I have corrected all the results for grouping as well as for the number of 
cells. The method employed for the former correction is fully described by 
Professor Pearson in Biometrika*. 


The following Tables give the contingency coefficients for each finger with 
each other finger. The two sets of coefficients are included, viz. those which are 
not corrected for grouping, that is, which are obtained without any assumption of 
a “natural order” and those which are so corrected, in order that the conclusions 
based on the latter may be compared with those based on the former. 


* «On the Measurement of the Influence of ‘Broad Categories’ on Correlation,’ by Karl Pearson, 
F.R.S., Biometrika, Vol. 1x. pp. 116—139. 
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TABLE 14a. 


Contingency Coefficients of Right Hand. 


R, Ry Rs Ry R; 
Rk, 1 429+ 011 *455 469 A73 
R, 429 1 “645 576 519 
hs *455 645 1 665 565 
Ry “469 576 “665 1 *690 
Rs 473 519 565 “690 1 
(Corrected for Grouping.) 
TABLE 140. 
Contingency Coefficients of Right Hand. 
R, 1 373 +012 “379 *400 385 
Ry 373 1 561 ‘S11 441 
Rs 379 561 1 568 *460 
R, “400 ‘511 568 1 576 
Rs; 385 441 ‘460 “576 1 


(Not corrected for Grouping.) 


TABLE 15a. 
Contingency Coefficients of Left Hand. 
L, Ly Ls; Ly L; 
I 1 503 “465 “474 508 + ‘012 
Ly 503 1 675 “609 539 
Ls 465 675 1 “724 585 
‘ZL; 474 -609 "724 1 “711 
Le “508 539 585 ‘711 1 
(Corrected for Grouping.) 
TABLE 150. 
Contingency Coefficients of Left Hand. 
L, Ly Ly Ly Ls 
Ly 1 435 390 “401 ‘410+ 014 
LT, 435 1 582 529 447 
ip -390 582 1 ‘611 ‘471 
; -401 529 ‘611 1 ‘577 
i “410 “447 ‘471 ‘577 1 


(Not corrected for Grouping.) 


— 
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TABLE 16a. 
Contingency Coefficients of Right Hand with Left. 
R Ry Rs R, R, 
Ds; 177 441 -440 446 | -424 
Ss 698 [5x5 Table . se |i @ose 
L, “479 an iE oe Table 640 559 521 
Ls "427 ‘608 ‘786 | ‘669 -| 561 
D4 "446 ‘587 663 | °814 | ‘675 
Ls “501 ‘515 OM OLS | “899 


(Corrected for Grouping.) 
TABLE 16). 
Contingency Coefficients of Right Hund with Left. 


R, Ry R; R, Rk; 
I “649 *B85 368 383 "347 
arGtG 626 [5 x5 Table] KS hick ie 
ie 412 653 [7x7 Table] | 2°) 493 439 
Lz *356 “530 656 572 “459 
4 “375 514 558 “702 "556 
Ts “402 "432 431 534 ‘707 
(Not corrected for Grouping. 


Remarks on Tables 14a, 15a, and 16a. (a) It will be seen from these 
Tables that the association of types between corresponding fingers of the two 
hands is, with one exception, always closer than that between any other pair of 
fingers. The order of magnitude of these associations is :— 

(1) Little Finger, (2) Ring Finger, (3) Middle Finger, (4) Thumb, (5) Index 
Finger. 

(b) If we omit the thumb for the present, leaving it for separate comment, 
and consider the association between corresponding fingers as of the “first order,” 
that between fingers of consecutive rank, such as R, and R;, or R, and L, as of the 
“second order,” and so on, we notice a significant relation between any particular 
association and its “order.” Thus: j 


First order associations range from *899 to “704 or ‘698, 


Second o i - = ‘724 to 608, 
Third 2 : ra K 609 to °537, 
Fourth Fe A ; Pr 539 to 515. 


The amount of overlapping in these ranges appears to be quite insignificant. 
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(c) It follows from (6) that if in any of these Tables we start from a first order 
association and pass in any direction through those of other orders we find a 
continuous and rapid fall; that is, a finger is always more closely related to a 
consecutive finger than to one more remote (but see (a)); and the greater the 
difference in rank between two fingers, whether on the same or on different hands, 
the less close is the association between them. 


(d) The association between any pair of fingers in one hand is, in general, 
closer than either of the corresponding associations between a finger of the right 
and one of the left hand. There is one exception to this rule in associations of the 
second order, one in the third and one in the fourth. 


(e) The associations of the left hand are in every case closer than the corre- 
sponding associations of the right. 


(f) The associations of either thumb with any finger all fall below those of 
the fourth order of (b), and the range of the sixteen coefficients is only from *424 
to ‘508. As it is difficult to base any conclusions on these figures as to the 
relations between the thumb and the various fingers, I have carefully checked 
them by reworking the whole of the calculations involved, but have in every case 
arrived at the same result. I have also found the probable error* for the largest 
and for one of the smallest coefficients of the set. As the contingency coefficients 
are all of the same order of magnitude and the number of individuals the same in 
all cases, the probable errors of all will be of about the same magnitude and it is 
unnecessary to calculate more. The probable errors in the two cases being of the 
order ‘011 the differences in the contingency coefficients may be regarded as 
insignificant. Although in three cases out of the four the contingencies of the 
thumb with the middle, ring and little finger respectively are in ascending order 
of magnitude, the differences are so small in comparison with the probable errors 
that no conclusion can be drawn as to the relations between the thumb and the 
various fingers. We may notice, however, that the rule (d) holds good for the 
thumbs with but two exceptions. 


The contingency coefficients given in Tables 14 b, 156, and 16, are all smaller 
than the corresponding results of the other series, but a careful study will show 
that the remarks (a) to (g) almost invariably apply to these Tables also. 


Note. In some preliminary work on this paper I classified the types as 
follows :—(1) Arches and loops with 1—3 ridges, (2) Loops with 4—10 ridges, 
(3) Loops with 11—14 ridges, (4) Loops. with 15 or more ridges, (5) Whorls, 
(6) Composites. With this classification the following contingency coefficients 
were found for corresponding fingers of the two hands:—Thumb ‘686, Fore- 
finger 642, Middle finger ‘686, Ring finger °730, Little finger ‘738. These results, 


which were not corrected for grouping, are seen to agree very closely with those 


* The method employed is that given in Biometrika, Vol. v. Parts 1. and u., ‘‘On the Probable 
Error of Mean-Square Contingency,” by John Blakeman and Kar! Pearson. 
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of Table 166, the values being rather larger probably on account of the slightly 
finer grouping. 

10. Comparison with Results of Previous Work. It would be well to compare 
briefly some of my results with those of the two works mentioned on p. 421. 

Whiteley and Pearson arrived at the following conclusions :— 

(i) The hand is a very highly correlated organ, far more highly correlated 
than the skull and even somewhat more so than the long bones. 

(11) The parts of the left hand are distinctly more closely correlated than 
those of the right. 


(ii) The order of correlation of the first finger joints is identical for both 
hands. This order is as follows :— 


(a) The external fingers have the least correlation and the little finger always 
less than the index. 


(6) <A finger has always more correlation with a second than with any other 
finger from which it is separated by the second. 

(iv) With corresponding members on both sides the extreme pairs show least 
correlation, and the pair of middle fingers higher correlation than the pair of ring 
fingers. 

In the paper of Miss Lewenz and Miss Whiteley the chief results which are 
comparable with those for the finger-prints are the following :— 

(v) There is a slight, but we cannot say definitely significant, preponderance 
in the correlations of the right hand bones over those of the left. 

(vi) Dividing the hand into marginal members, i.e. thumb, index and little 
fingers, and central members, i.e. middle and ring fingers, and the bones into 
“lower bones,” i.e. distal and middle phalanges, and “upper bones,” i.e. metacarpal 
bones and proximal phalanges, the correlations roughly speaking are highest for 
the upper bones of the central members and become less as we move out from this 
upper centre towards the lower and marginal parts of the hand. This is true 
whether we take pairs in lateral or in longitudinal series. 

(vii) The highest correlations occur between corresponding bones of the right 
and left hands. 

(viii) Generally there is a “rule of neighbourhood,” i.e. any bone is more 
closely correlated with a second of the same series than with any other from which 
it is separated by that second. 

The above conclusions are to a certain extent mutually corroborative: e.g. (v1) 
and (iv) are in agreement, and (vill) agrees in substance with (111)). Again (vil) 
agrees with Table IV, p. 130, of the “ First Study,” while (aii a) is in general sup- 
ported by Table XXII of the “Second Study.” On the other hand (ii) and (v) do 
not agree. It should be noted, however, that the “ First Study ” was based on the 
measurements of the first finger joint only of both hands of 551 women, while for 
the “Second Study,” in which all the finger bones were measured, only 37 to 44 
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skeleton hands were available. The writers of the latter paper state that in con- 
sequence of the comparatively small number of bones measured they look upon 
that study “as one of suggestion rather than of definite statistical proof,’ and it 
is possible that with more adequate data their results might have been somewhat 
modified and exceptions less numerous. 

There appears to be no direct connection between finger bones and the patterns 
of finger-prints, but it is distinctly interesting to find that some of the most 
striking relations discovered amongst the former also exist in the latter. In 
particular, my conclusion (a) agrees with (vii), (c) with (1116) and with (viii), and 
(e) with (i1) but not with (v). 

11. Concluding Remarks. The most important conclusions reached in this 
paper have been summarized on p. 432, and in the Remarks on Tables 8 and 9, 
pp. 443—445, and on Tables 14—16, p. 449; it scarcely seems necessary to re- 
capitulate them, but a comparison will show an almost perfect agreement although 
the sets of results have been obtained by entirely different methods. 


The essential results of the present paper are that finger-prints are not scat- 
tered at random over the fingers; certain types are more or less peculiar to certain 
fingers, and further the appearance of one type is correlated with the appearance 
of asecond. In this respect certain fingers are more closely related to each other 
than to any third finger, and the distribution of this relationship is in general 
similar to what is known of the like distribution of the correlations of the bones of 
the same fingers. 

It has been already stated that the material used is taken entirely from adult 
males of the lower type of the artisan and labouring classes; it would be of 
interest to compare the results obtained with those found from the finger-prints 
of females of the same grade of society, and also when the material is drawn 
from the professional classes. 

Tables I to XX, and XLVI to LV, are of a type which I have not previously 
met with; novel methods have accordingly been employed in calculating coefficients 
of contingency and correlation ratios from those Tables. The general investigation 
of Tables of this type offers an interesting problem, demanding further study. 

I am deeply grateful to Professor Pearson for placing at my disposal the 
necessary material together with a number of books and memoirs bearing on 
the subject, and for much valuable assistance given during the course of the 
investigation. 

It can scarcely be expected, with such a mass of numerical calculation involved, 
that the work should be entirely free from inaccuracies, but I trust that no serious 
errors have escaped detection. The laborious arithmetic has been much lightened 
by the use of a calculator, for the loan of which my thanks are due to the Govern- 
ment Grant Committee of the Royal Society of London. 


The Tables on which the preceding calculations are based are given in the 
Appendix, pp. 453—478. 


Large Loops, BR. 


Whorls, R. 


Small Loops, 2. 
Mise Co WHOS 


H. WaItTE 


APPENDIX. 
TABLE I. 
Arches and Small Loops, Right. 
Arches, R. 
0 1 2 if 3 4 5 Totals | 


453 


Composites, R. 


Totals} 1541 | 294 111 33 16 5) 2000 
TABLE II. 
Arches and Large Loops, Right. 
Arches, R. 
: 0 1 2 3 4 5 Totals 
0 | Qe 16 457 
1 489 | 114 46 10 O — 659 
2 400 66 12 O — — 478 
3 245 31 4 a — _ 280 
5 TORS |) Sp ae eS eR eee Tats 
5 18 18 | 
1541 | 294 111 33 16 5 2000 | 
TABLE III. 
Arches and Whorls, Right. 
Arches, R. 
0 il 2 3 4 5 Totals 
o | 512/215 | 86 | 27 | 16 | 5 | 8e1 
1 406 61 24 6 0) — 497 
2 275 16 Le — — 292 
3 168 2 0) = — — 170 
J One On | a0 130 
5 50 | — — | — = = 50 
Totals | 1541 | 294 ial 33 16 5 2000 
TABLE IV. 
Arches and Composites, Right. 
Arches, R. 
Totals | 
0 1423 
1 442 
Q 118 
3 14 
4 3 
5 0) 
Totals | 1541 | 294 111 33 16 5 2000 


58—2 
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TABLE V. 
Small Loops and Lurge Loops, Right. 
Small Loops, R. 


@) 1 2 3 4 ii) Totals 
RG | Lad 
ze 0 104 | 69 67 81 91 45 457 
rales od 144 | 99 | 142 | 154 | 120 == 659 
2 2 143 | 141 | 123 71 — — | ‘478 
ea 8 119 | 101 60 280 
© 4 65 | 43 = = == = 108 
OA eas 18 | 18 
fan] | Mees s 
es Totals} 593 | 453 | 392 | 306 | 211 45 | 2000 


TABLE VI. 
Small Loops and Whorls, Right. 
Small Loops, R. 


—__—— 


0 1 2 8 4 5 | Totals 


as | ? 78 | 144 | 204 | 211 | 179 | 45 
Zi 1066 153 126 60! ao ee 

oe |\ 2 TBONO2 G55 <2 5 a5) ee ee 

B87 A125 438 (| eae 

| 4 | 104 |-.26 

= 5 50 | — = - — -- 


Totals } 593 | 453 | 392 306 211 45 


TABLE VII. 
Small Loops and Composites, Right. 
Small Loops, R. 


0 1 2 3 4 5 Totals 
= 0 353 | 294 279 257 195 45° | 1423 
a 1 165 121 | 93 47 16 — 442 
2 2 62 34 20 2 — — 118 
g 3 10 4 0) = = = 14 
euliely 3 Oni | 2 Se eee Sale alee 3 
5 0 ) 
i) 
iS) 
Totals | 593 453 392 306 211 45 2000 


TABLE VIII. 
Large Loops and Whorls, Right. 
Large Loops, R. 


0 1 | -2 3 4 5 Totals 
es | 197 | 289 | 162 | 133 62 18 861 
7 86 | 130 | 148 87 46 Ss 497 
& AT 84 | 101 60 = = 292 
2 27 76 67 = = == 170 
a=} 50 80 130 
S 50, | sae = ene 50 


457 | 659 | 478 280 108 18 | 2000 
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TABLE IX, 
Large Loops and Composites, Right. 
Large Loops, R. 


0 | 1 2 3 4 5 Totals 
RS} 66 )6f 37 | 471 | 318 | 205 | 94 | 18 | 1498 
ae) oi Some SSe TSO ltsy | eid, keke |) dag 
Ee alae? le if com ouraids. jer len | ais 
Bl 3 9 2 Be ah tev lte Seo, bs 14 
Sih 2 2 gh | eee petty ty a2 = 3 
Bl 5 RIN cease | hcp Ae ca 0 
S 
Totals | 457 | 659 | 478 | 280 | 108 | 18 | 2000 


TABLE X. 
Whorls and Composites, Right. 
Whorls, R. 
= 0 
ig os 
a= DQ 
ay 3 
4 
el 5 
ie) 
Totals 
TABLE XI. 
Arches and Small Loops, Left. 
Arches, LZ. 
By Totals 
nj : 
o 0 496 
Qy 1 425 | 
Sale? 366 | 
4 2 315 
= 4 283 
a| 6 115 | 
NM 
| Totals 2000 | 
TABLE XII. 
Arches and Large Loops, Left. 
Arches, L. 


1 


Large Loops, L. 
Me Cs W@RS 


re) 
° 
co 
© 
= 
DR 
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TABLE XIII 
Arches and Whorls, Left. 
Arches, L. 


Totals 


Whorls, L. 
Tit Ss WHR DS 


Totals | 1542 102 
TABLE XIV. 
Arches and Composites, Left. 
Arches, L. 
| areal ee | 4 | 5 | Totals 


20 4 1436 


lo 
Oo 


1 Oe ely OA 
2 eas |} nes 
3 22 
5 =) = 4 
5 =| = i 


Composites, L. 


Totals 


1542 | 290 102 42 | 20 0) 2000 


TABLE XV. 
Small Loops and Large Loops, Left. 
Small Loops, L. 


[hg 0 1 Bie ie 4 | 5 Totals 
Sj | 
- 0 82 48 | 63 83 | 124 | 115 515 
2, 1 121 73 94 | 130 | 159 = 577 
S 2 97 | 116 | 108 | 102 =e = 423 
4 3 100 | 110 | 101 = =e = 311 
ee 4 63578 141 
0 5 335 | == = = a = 33 
Si 

Totals | 496 | 425 | 366 | 315 | 283 | 115 | 2000 


TABLE XVI. 
Small Loops and Whorls, Left. 
Small Loops, L. 


MA Ss WHOS 


Whorls, L. Composites, L. 


Composites, Z. 


TABLE XVII. 


H. WaAITE 


Small Loops and Composites, Left. 
Small Loops, Z. 


TABLE XIX. 
Large Loops and Composites, Left. 


Large Loops, L. 


0 1 2 3 4 i) Totals 
0 234 292 264 264 | 267 115 | 1486 
i 176 110 84 48 16 — 434 
2 64 19 17 3 = — 103 
3 17 4 1 — — -—— 22 
J 4 one 4 
5 ig |p c= ahs ee ee 1 
Totals | 496 | 425 366 315 283 115 } 2000 
TABLE XVIII. 
Large Loops and Whorls, Left. 
Large Loops, L. 
| 0 1 De ae ep Py ese notals 
0 338 315 195 165 106 33 1152 
1 63 94 114 102 35 — 408 
2 22 58 79 44 —_ — 203 
3 26 63 35 — — — 124 
y 45 | 47 92 
5 21 21 


Composites, L. 


0 1 | 2 3 4 5 | Totals 
0 399 | 276 1436 
1 102 124 | 119 66 23 — 434 
2 23 41 24 15 — — 103 
3 7 Mil 4 — — — 22 
4 2 2 — 4 
5 ie ee atl 1 
Totals | 515 577 | 423 311 141 33 2000 
TABLE XX. 
Whorls and Composites, Left. 
Whorls, L. 
0 1 2 3 4 5 Totals 
ON, Fra aa ae es as | Nin!) 
0 924 | 249 120 66 56 21 14386 | 
1 173 | 120 59 46 36 — 434 
2 44 26 21 12 — — 103 
3 8 al 3 — — = 22 
4 2 2 
5 1 — — = — = 
Totals} 1152 | 408 | 203 | 124 | 92 | 21 


457 
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TABLE XXI. 
Arches, Right, and Arches, Left. 
Arches, R. 
0 1 2| es 4 | 5 | Totals 
0 1369 | 145 24 3 1 0 1542 
See 146] 97 | 40 a 0 | 0 | 290 
wn 2 24 45 26 5 2 0) 102 
2 3 2 6 16 12 6 ) 42 
Sala es Oylcanl 5 6 ae 20 
<q bi 0) 0 (0) 0 2 2 4 
Totals J 1541 | 294 111 33 16 5 2000 


TABLE XXII. 
Arches, Right, and Small Loops, Left. 
Arches, R. 
0 1 2 3 p 5 | Totals” 

Qj — aaa 
Pen 473 | 14 5 0) 2 2 496 
al 342 | 47 17 10 6 3 425 
Sere 249 | 68 29 15 5 0) 366 
Hl 3 209 | 74 28 2 2 0 315 
= L 187 | 66 26 4 0 ) 283 
= 5 81 | 25 6 2 1 0) 115 
SI 
n | 

| Totals} 1541 | 294 | 111 33 16 5 | 2000 


TABLE XXIII. 
Arches, Right, and Large Loops, Left. 


Arches, R. 

Totals 

nD 515 

or 577 

8 423 

4 311 

o 141 

2 33 
eS 

2000 

TABLE XXIV. 
Arches, Right, and Whorls, Left. 
Arches, R. 

g Totals 

. 1152 

> 408 

r) 203 

B 124 

a 92 

= 21 


Totals 


H. WaltTE 4 
TABLE XXV. 
Arches, Right, and Composites, Left. 
Arches, R. 

2 3 4 5 Totals 
= Obie esr 16. |) 5. fide 
2B 13 4 0) 0) 434 
2 2 1 0 0 103 
2 1 0 0 4) 0 22 
Q4 0) 0 0 0 4 
g 0 0 ) 0) 1 
[e) = aD 

Totals J 1541 | 294 111 33 16 7) 2000 
| 


TABLE XXVI. 
Small Loops, Right, and Arches, Left. 
Small Loops, R. 


0 1 2 3 4 5 Totals 


558 369 QO liter 129 30 1542 


ot Dae aon |) va | me | by | tL |b 290 
gf Mieilbesisfe nis) ul) aoe |) 01 4 | 102 
3 Tech aley ihe ie 3 0 49 
e 3 5 7 4 1 0 20 
“a 2 2 0 0 0 0 4 


593 453 | 392 306 211 45 2000 


TABLE XXVII. 
Small Loops, Right, and Small Loops, Left. 


Totals 
496 
425 
366 
315 
283 
115 


Small Loops, L. 
Aes WeRS 


Totals 2000 


TABLE XXVIII. 
Small Loops, Right, and Large Loops, Left. 
Small Loops, &. 


Totals 


515 
577 
423 
311 
141 

33 


Large Loops, L. 


2000 
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Small Loops, L. 
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TABLE XXIX. 
Small Loops, Right, and Whorls, Left. 
Small Loops, BR. 


0 1 2 3 J 5 
0 159 217 286 | 249 198 43 
i 130 143 79 | 48 12 1 
Q 112 58 19 13 I 0 
} 92 25 6 | 1 10) 10) 
y 79 10 eo 0 1 
5 21 0) O 10) 0 0) 
Totals | 5938 453 392 306 211 45 
TABLE XXX, 
Small Loops, Right, and Composites, Left. 
Small Loops, AR. 
0 1 | 2 8 | 4 5 Totals 
0 | 323 | 311 | 299 | 267 | 191 | 45 | 1436 
1 194 LOD IF 82 32 | 17 0) 434 
2 60 22) | ll | vi | 3 0 103 
3 12 10 | 0) O 0 0 22 . 
J 4 0 | Oo 0 0 0 4 
5 Oneal 0 0 OueaO 1 
Totals | 593 | 453 392 306 | 211 45 2000 
TABLE XXXI. 
Large Loops, Right, and Arches, Left. 
Large Loops, &. 
| 0 | 1 | 2 3 | } | 5 Totals 
0 290 483 394 | 255 1030 ell 1542 
I 22 ; 
3 
J 
ii) 
Totals} 457. | 659 | 478 | 280 | 108 | 18 | 2000 


TABLE XXXII. 
Lurge Loops, Right, and Small Loops, Left. 
Large Loops, R. 


Totals 


496 
425 
366 
B15 
283 
115 


Totals 3 2000 


— oe 
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TABLE XXXIII. 
Large Loops, Right, and Large Loops, Left. 
Large Loops, R. — 


a baer eral | | | 

lO A OR is | isa) 3 
ome 7 121 | 956 | 135 | 59 5 
Sa 2 53 | 135 | 149 | 67 | 18 
4 3 17 | 66 | 101 77 =| “Al 
) 4 2 20 36 AT 32 
SS 5, 0 4 ¢ 
4 

Totals | 457 | 659 


. TABLE XXXIV. 
Large Loops, Right, and Whorls, Left. 
Large Loops, R. 


Whorls, L. 
Meer Cs DMS 


Totals 


TABLE XXXV. 
Large Loops, Right, and Composites, Left. 


| | 0 | ae 3 J | 5 | Totals | 
——— — 
. 0 Boel ATeeasoSn elo 1 Sle | 13 1436 
cs 1 90 | 140 | 107 70 93--) 4 434 
| 2 Hh eels ees, alg) 4" 103 
Pima 4 St Rates Ce eal Olle 20 22 
S 4 2 2 Olea OF Oe |e 4 
5 5 Ss ieee Oita |e): OM | 102 Inn 20 1 
O | | aot 

Totals} 457 | 659 | 478 280 | 10s | 18 | 2000 


TABLE XXXVI. 
Whorls, Right, and Arches, Left. 
. Whorls, R. 


S 0 | 526 | 405 | 262 | 169 | 130 | 50 | 1542 
al 193 t6On hoya sie O:| 0 4 290 
Ms 2 80 | 19 3. || 20 One 20) AP 102 
S| 3 38 4 0 0) One) (0) 42 | 
J 20; nO OF). 20 OF 0 20 

nl ode deh OG |e 20 O20) 70%), 10 4 


| Totals | 861 | 497 292 |170 | 130 | 50 | 2000 
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TABLE XXXVII. 
Whorls, Right, and Small Loops, Left. 
Whorls, R. 


S— 
& O 

nD | 

jor 

2 | 

alii | 80 5 
= 202 | 65 15 1 
S 99 | 16 O 0 
g 

NM 


Totals 497 292 170 130 50 


TABLE XXXVIII. 
Whorls, Right, and Large Loops, Left. 


Whorls, R. 


3 


Large Loops, L. 


| Totals | 497 


TABLE XXXIX. 
Whorls, Right, and Whorls, Left. 
Whorls, R. 


Whorls, L. 
Mey to OWNS 


| Totals | 861 497 292 170 130 50 | 2000 


TABLE XL. 
Whorls, Right, and Composites, Left. 
Whorls, R. 


0 i 2 3 4 5 

NS ‘y Lard 9° 
2 0 744 | 331 | 185 93 55 28 | 1436 
ey 1 98 | 132 78 58 52 16 434 
| 2 16 28 19 16 19 5 103 
oa 3 el es 7 2 4 1 22 
Es 4 Ouran a1 2 1 0 0 4 
5 5 0 0 1 0 0 0) 1 
S | ES EET SEES 

Totals | 861 | 497 | 292 | 170 | 130 50 | 2000 


H. Walt 


: TABLE XLI. 
Composites, Right, and Arches, Left. 
Composites, R. 


0 il 2 8 4 5 Totals 
| 
- 0 1042 | 378 | 107 12 3 O | 1542 
ial ae 933 | 46 9 2 0 0 | 290 
a 2 84| 16 2 ) 0 0 102 
a 3 40 2 0 0 ) ) 42 
S 4 20 0 0 0 0 0 20 
<q 5 4 ) 0 0) 0 6) 4 


2000 


TABLE XLII. 
Composites, Right, and Small Loops, Left. 


Composites, R. 


0 i 2 BS 4 5 Totals 
Sy See 
s 0 496 
Qy 1 425 
=) 2 366 
4 3 315 
= 4 283 
s 5 115 
5 
asta 2000 
TABLE XLIII. 
Composites, Right, and Large Loops, Left. 
Composites, R. 
; 0 1 i, 3 4 5 Totals 
| pent =e 
Spano 416 | 80 | 17 2 0 0 515 
= 1 416 | 121 36 3 1 (0) 577 
io) Zz, Arey || alial 27 6 1 (0) 423 
4 3 210 | 79 20 1 1 0 311 
0) 4 86 Bi 16 2 0) (0) 141 
orl > pals ae 2 Ole 0 0 33 
5 | 
Totals | 1423 | 442 118 TA tS (0) 2000 


TABLE XLIV. 
Composites, Right, and Whorls, Left. 
Composites, R. 


Whorls, Z. 
ABW Ss WHO 


Totals J 1423 | 442 118 14 3 0) 2000 
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TABLE XLV. 
Composites, Right, and Composites, Left. 


Composites, R. 


0 eee 3 J 
= 0 1100 12765 | 53) 4 6 1 
a 1 958°| 130°) 432) 43 0) 
ay 9g by 28 14 | 3 1 
S 3 8} 8 4 2 ) 
= 4 0 0 3 lal 30 1 
3 5 ) ) eid 0 0) 
0 
Totals | 1423 | 442 118 14 3 
ar TABLE XLVI. 
Arches and Small Loops, Both Hands. 
Arches. 
i) 1 2 B 4 5 6 if 8 9 10 {Totals 
0 361 7 1 1 1 |-0 0 0 0 0 2 373 
1 205 | 19 7 6 3. k= 30 ) 0 0 5 — 245 
g 168 33 12 7 ‘| (0) 5 2 5 — — 223 
} 139 36 13 9 S| 6 4 10 — — — 225 
105 40 25 10 2. | 6 10 — — —_ — 198 
102 | 33 25 18 10 10 — — — — — 198 
88 | 37 21 18 15 — — — — — — 179 
74 47 23 21 — —- — — — —_— 165 
638] 298. |. TS ha) ee eT 
45] QI = =< pe = = _ = _ = 66 
19-|° = = — se = = =5 — a _ 19 
1369 | 291 | 145 90 40 22 19 12 5 5 2 | 2000 
TABLE XLVII. 
| 
Arches and Large Loops, Both Hands. 
Arches. 
§ 
0 109 | 32 24 32 19 13 12 11 5 5 | 2 264 
if 164 | 55 34 23 ll 6 5 1 0) 0 — 299 
2 230 70. _|__33 18 4 3 2 (0) 0) — — 360 
3 225 41 27 9 4 (0) 0) 0) — — — 306 
4 212 | 45 16 5 1 0 0 = = = — 279 
5 160 | 22 9 2 OS XO = _ a= = = 193 
6 119 13 2; 1 1 | — — — — — — 136 
7 86 9 0 0) —_— — — — — — —_— 95 
8 49 3 (0) — — — — == =e 
9 1 — — — —_ — pes 
10 sy Pees ee | ey ee = 


oe Sle 


ee 


=. 


eS ae Ven eee 
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TABLE XLVIILI. 
Arches and Whorls, Both Hands. 
Arches. 
| 0 | 2 1 pee eS ee ae ae ee 
0 351 | 160 100 66 30 21 16 12 ii) 5 2 768 
1 240 69 25 16 7 1 2 0) O O — 360 
2 191 38 ll 5 2 (0) 1 0 O — — 248 
BS 8 146 14 8 Dy 1 10) 0) O — — — 171 
Fly 124] 6 1 1 0 0 63 = 132 
2 5 86 3 (0) (0) (6) O — — — — — 89 
= 6 ah 1 (6) (0) O — — — 78 
t 71 O O 10) — — — —_ — — — 71 
8 47| 0 0 ae = 47 
9 23 (0) — = — — | 23 
10 13 Sly _— — 13 
| Totals | 1369 | 291 145 90 40 22, 19 12 5 5 2 2000 
| 
TABLE XLIX. 


Arches and Composites, Both Hands. 


Arches. 


eee ee ote.) 7 |. 2 | 
650 |193 | 101 | 62 | 33 1 
40g | 66 | 34 | 19 6 

‘ 202 | 92 ale ty i 

8 3 1 g 

2 0 1 

oy 0 0 0 

g 0) (0) (0) 

} 0 0 ot 

e) (0) — a 


40 


TABLE 
Small Loops and Large Loops, Both Hands. 


L. 


Small Loops. 


0 1 2 3 4 5 6 ” 8 9 10 

0 44 13 10 18 17 20 16 | cot 31 30 19 

1 44 19 13 19 14 30 37 45 42 BG alse 

a 2 60 94 | 24 S10) | Bin 42 54 55 SG ip hee 
a| 3 5) 31 25 39 43 47 32 Onn ees || ee eee 
5 y 49 34 | 41 44 | 46 40 25 ea mene iL se 
om 5 35 39 38 33 29 19 = = = = aa 
AG 34 | 32 34. 22 TOES | st TR ee al a ec a (ae 
= 7 24 | 99 | 29 Oe | reise fl, (aed RS ae ne 
eS 8 16 20 13 ye ee oh | eee ee 
9 9 Oe RS cee Se tL" a a (bee 

10 By a a a ee ee 

= j | 
Totals] 373 | 245 | 223 | 225 | 198 | 198 | 179 | 165 | 109 66 19 


Totals 


264 
299 
360 
306 
279 
193. 
136 

95 

52 

13 
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TABLE LI. 
Small Loops and Whorls, Both Hands. 
Small Loops. 


Whorls. 


Totals 24! 2° Hi 179 


TABLE LILI. 
Small Loops and Composites, Both Hands. 
Small Loops. 


| Beales | 7 | 8 | 9 | 10 |Totals 
0 1098 
1 536 
E; a 240 
mt Mer es 
nD 4 6 
S. i) 
= ‘ 
j=) 
iS) 
Totals | 373 
TABLE LIII. 
Large Loops and Whorls, Both Hands. 
‘Large Loops. 
an si 2 8 J a | @ | y | 8 9 | 10 [Totals 
0 156 138 133 91 73 51 43 42 29 9 3 
1 24 45 54 63 56 39 30 29 16 4 — 
2 12 24 36 35 50 3) 30 19 7 — — 
a 3 12 16 30 21 35 35 17 5 — a= — 
cele 8 7 | 20° | 32 | 96 | 23 | 16 
fe 5 3 8 20 20mg 928 10 — -- — — |— 
= 6 4 9 24 30 11 — —_— 
if 10 18 29 14 — — — aoe 
8 11 22 14 — — 
9 11 12 — a = -- — — 
10 13 (etna ie | ra eee Ba || 
— 
Totals J 264 | 299 360 306 279 193 136 95 52 13 3 


Composites. 


Composites. 
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TABLE LIV. 


Large Loops and Composites, Both Hands. 


Large Loops. 


Whorls and Composites, Both Hands. 


TABLE LV. 


Whorls. 


0 tr | 2 3 5 5 6 : By ano 10 |Totals 
0 26 1098 
1 40 536 
2 12 240 
3 8 85 
y 3 26 
5 ) Tl 
6 ai 6 
7 as l 
8 ai 1 
9 ae 0 
10 be 0 
Totals 248 | 171 | 132 | 89 2000 
TABLE LVI. 
Right Thumb and Indea. 
Right Thumb. 
A SL LL W (6! Totals 
Pe 

= A 29 97 148 50 28 352 

a Sh 12 125 320 139 58 654 

x LE 2 27 149 125 36 339 

a W 1 26 144 | 260 50 481] 

20 C0 2 15 54 75 28 174 

aa aD | 
Totals | 46 290 | 815 | 649 200 | 2000 
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Right Middle Finger. 


Right Little Finger. 


Right Middle Finger. 


Right Ring Finger. 
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TABLE LVII. 
Right Thumb and Middle Finger. 
Right Thumb. 


A f 
SL 18 
LL 1 
W 2 
C 0) 
Totals 46 


290 


SL 


LL 


428 
223 
69 
20 


815 


W 


31 
215 
208 
160 
35 


649 


200 


Totals 


TABLE LVIII. 
fight Thumb and Ring Finger. 
Right Thumb. 


| 


A SL bh We C Totals 
BL 5 60 | 248 166 64 543 
W 8 55 | 929 | 355 82 729 
eee 2 22 73 57 22 176 
| Totals | 46 290 | 815 | 649 | 200 2000 
TABLE LIX. 
Right Thumb and Little Finger. 
Right Thumb. 
A SL LL Ww G Totals 
A 8 ‘14 5 3 i 31 
Se 30 198 | 414 | 162 66 870 
BD, 5 61 300 | 304 94 764 
W i 10 58 137 22 228 
Oo 2 4 38 43 17 107 
Totals | 46 290 | 815 | 649 | 200 2000 
TABLE LX. 
Right Index and Middle Finger. 
Right Index. 
A SL THs W C Totals 
A 127 69 9 5 2 212 
SE 187 | 453 119 104 58 921 
Tes 30 | 108 144 | 1792 62 516 
W 4 16 48 | 178 28 274 
C 4 8 19 22 24 thy 
Totals | 352 | 654 | 339 | 481 | 174 | 2000 


—_— sl a 
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TABLE LXI.- 
Right Indea and Ring Finger. 
Right Index. 


Totals 


Right Ring Finger. 


TABLE LXII. 
Right Index and Little Finger. 
Right Index. 
ie isn en ewe) eo leretats 


Right Little Finger. 


TABLE LXIII. 
Right Middle and Ring Fingers. 
Right Middle Finger. 


Right Ring Finger. 


TABLE LXIV. 

Right Middle and Inttle Fingers. 
Right Middle Finger. 

| Ww 


Right Little Finger. 


60—2 
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TABLE LXV. 
Right Ring and Little Fingers. 


Right Ring Finger. 


S SL Totals 
& 
ey 31 
o | 870 
= | 764 
tS | 228 
7m | 107 
a aad 
i] 543 2000 
= | 
TABLE LXVL. 
Left Thumb and Index. 
Left Thumb. 
A SL LIL W C Totals 

eal ne A733 78 25 | 30 313 
SS SL 32 313 355 68 65 833 
iG LL 5 40 143 AT AT 282 
2 W 3 46 136 166 86 437 
ro) C 4 15 55 35 26 135 
=| 

Totals | 91 547 767 341 254 2000 

TABLE LXVII. 
Left Thumb and Middle Finger. 
Left Thumb. 

i SL Totals 
to} 0) 
A=] y 
ew 
© SL 
es | ie 
o| W 
= C 
3 | Total 
| otals 


Left Ring Finger. 


| Totals 


TABLE LXVIIL 
Left Thumb and Ring Finger. 
Left Thumb. 


7 
264 | 198 
152 | 342 
60 | 172 
33 48 | 


547 767 


5 
38 
110 
158 
30 


341 


Totals 


66 
583 
712 
491 
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TABLE LXIX. 
Left Thumb and Little Finger. 
Left Thumb. 
a Totals 
oO 
bp 
42) 35 
Falla aCye 959 
ae LL 768 
BS W 150 
I C 88 
ney 
3 Totals 
TABLE LXX. 
Left Index and Middle Finger. 
Left Index. 
3 A SE ae W © | Totals 
= 
ty A 117 84 9 Z 3 215 
© SL 166 525 80 79 37 887 
oS IDE, 19 187 157 139 54 556 
ac) Ww 6 Q5 21 171 17 240 
SHieo 5 12 15 46 | 24 102 
~ 
8 Totals } | 282 2000 


TABLE LXXI. 


Left Index and Ring Finger. 
Left Index. 


Left Ring Finger. 


SL LL 


833 


TABLE LXXII. 
Index and Little Finger. 


Left Index. 


Left Little Finger. 


SL LL Ww 


Totals 


471 


472 


Left Little Finger. Left Ring Finger. 


Left Little Finger. 


Left Thumb. 
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TABLE LXXIII. 
Left Middle and Ring Fingers. 


Left Middle Finger. 


Totals 


| A S L 


13 
421 
311 

88 

54 


215 887 


LL W 


556 240 


102 


Totals 


66 
583 
712 
491 
148 


2000 


TABLE LXXIV. 
Left Middle and Inttle Fingers. 


Left Middle Finger. 


SLI 


W 


Totals 


Totals | 215 887 556 240 102 2000 
——— 
TABLE LXXV. 
Left Ring and Inttle Fingers. 
Left Ring Finger. 
| PO Si Pe | W | Cc | Totals 
c (0) 1 35 
99 44 959 
212 73 768 
120 15 150 
60 15 88 
Totals | 66 | 583 | 712 | 491 | 148 | 2000 
TABLE LXXVI. 
Right Thumb and Left Thumb. 
Right Thumb. 
A Gn i Wee | motte): 
A 91 
SL 547 
LL 767 
Ww 341 
C 254 
Totals 2000 


aa de 


H. WaItE 


TABLE LXXVII. 
Right Thumb and Left Index. 
Right Thumb. 


A SL LL WwW C Totals 
3 A 313 
s SL 833 
& aE 282 
ie W 437 
Gay 
D 135 
om e [SS 
Totals 290 | 815 | 649 2000 
TABLE LXXVIIL. 
Right Thumb and Left Middle Finger. 
Right Thumb. 
S A Speen | Ww Cc | Totals | 
E Fes 
fy 23 60 86 32 14 215 
ie 17 167. | 414 | 210 79 887 
= 0 40 | 232 | 214 70 556 
oS 5 18 57 141 19 240 
S 1 5 26 | 52 18 102 
2 
3 46 290 815 | 649 | 200 2000 


TABLE LXXIX. 
Right Thumb and Left Ring Finger. 


Right Thumb 


. 


Left Ring Finger. 


TABLE LXXX. 
Right Thumb and Left Little Finger. 


Right Thumb. 


ra 

o 

Ey 

o_ A 

Ee isu 
foe a, 
~~ W 
4 C 

ae 

os Totals 


A SL LL W C 
10 17 6 2 (0) 
29 207 468 181 74 
2 54 275 341 96 
3 8 40 80 19 
2 4 26 45 11 
46 290 815 649 200 
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Left Index. 


Association of Finger-Prints 


TABLE LXXXI. 


Right Index and Left Thumb. 


Right Index. 


Totals 


A SL LL W C 
a 
lalla 91 
= SL 547 
ale Ae 767 
iS W 341 
o C 254 
SSS SST 
Totals 352 654 339 481 174 2000 
TABLE LXXXIL. 
Right Index and Left Index. 
Right Index. 
| aol st. sz, \ cee) re wr omlinee 
A 313 
SL, 295 
SLy 538 
1H 88 
LL, 194 
hee 437 | 
C 135 
Totals 352 282 372 174 165 481 174 2000 
TABLE LXXXIII. 
Right Index and Left Middle Finger. 
Right Index. 
Be 
on 
S 
i A 
© SL 
= LL 
Ss W 
S| @ 
— 
3 Totals 


TABLE LXXXIV. 
Right Index and Left Ring Finger. 
Right Index. 


Left Ring Finger. 


Totals | 


66 
583 
712 
491 
148 


H. Wattr 


TABLE LXXXV. 
Right Index and Left Little Finger. 
Right Index. 


TABLE LXXXVI. 
Right Middle Finger and Left Thumb. 


Right Middle Finger. 


x | ee Se ye ech Totals. | 
0 | | 
= A ¢ 35 | 
Ba SE) i 236.425 12 nae) 192 50 959 | 
=| LL 81 | 183 | 176 | 224 | 104 768 | 
Ss W 5 21 30 86 8 150 | 
HH 6 6 5 17 49 11 88 | 
= | | 
—j | Totals}: 352 | 654 | 339 | 481 


Totals 
fe) 
S 91 
= 547 
a 767 
a 341 
Bs 254 
4 
2000 
TABLE LXXXVII. 
Right Middle Finger and Left Index. 
Right Middle Finger. 
Hea aed W. | .0% | -Totals: | 
i I | 
i | A 10 | 176 | 21 1 5 313. | 
TSM) VG 86 | 529 | 177 2a 7 833 
eS | LL 7 95 125 42 13 282 
2 | Ww 5 80 144 180 28 437° | 
Ss C 4 41 49 27 14 135 | 
TABLE LXXXVIILI. 
Right Middle and Left Middle Fingers. 
Right Middle Finger. 
5 A SIAL, kW. C | Totals 
oe e 
eee A 115 94 4 0 2 215 
ay UE 84 635 129 25 14 887 
wr | LL 8 152 295 70 31 556 
ae) W 3 20 65 140 12 240 
Ss C 2: 20 23 39 18 102 
& 
SI Totals 212 921 516. 274 77 2000 
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Left Thumb. Left Little Finger. Left Ring Finger, 


Left Index. 


Association of Finger-Prints 


TABLE LXXXIX. 


Right Middle and Left Ring Fingers. 


Right Middle Finger. 


A SL | LL W C Totals 
A 49 16 1 0 0 66 
SL 110 398 | 59 8 8 583 
LIL 39 346 | 241 63 23 712 
W 9 102 165 184 31 491 
C 5 59 |. 50 19 15 148 
Totals | 212 921 516 274 77 2000 

TABLE XC. 
Right Middle and Left Inttle Fingers. 
Right Middle Finger. 

SL Totals 
13 35 
959 
289 260 137 47 768 
30 57 56 3 150 
18 3l 28 8 88 
2000 


921 516 274 77 


TABLE XCI. 
Right Ring Finger and Left Thumb. 
Right Ring Finger. 


Totals 
17 
SL 121 49 547 
LL 262 Tes 767 
W 3 28 88 199 23 341 
C g 130 254 
Totals 729 2000 
TABLE XCII. 
Right Ring Finger and Left Index. 
Right Ring Finger. 
4 ae aa w | @ | Totals 
313 
833 
282 
437 
135 
2000 


Left Thumb. 


Left Ring Finger. Left Middle Finger. 


Left Little Finger. 


Right Ring and Left Middle Fingers. 
Right. Ring Finger. 


H. Waire 


TABLE XCIII. 


| Ay ese eet ye Ce “1 Totals 
| 47 122 SS | Peg) race et 215 
15 886 meas |e ais 83 887 

1 24 | 204 | 266 61 556 

0 5 96 |), 1949/15 240 

0 2 1 77 12 102 

63 | 489 | 543 | 729 | 176 } 2000 

TABLE XCIV. 
Right Ring and Left Ring Fingers. 
Right Ring Finger. 

A | SZ | LL | w | Cc | Totals 

| 66 

| 583 

| 712 

491 


Totals 


148 


TABLE XCV. 


Right Ring and Left Little Fingers. 


Right Ring Finger. 


| A SL LL W Cc Totals 
A 17 15 3 0 0) 35 
SL 42 399 250 194 | 74 959 
LL 4 68 276 339 81 768 
W 0 5 9 125 11 150 
C 0) 2 5 71 10 88 
Totals 63 489 543 729 176 [ 2000 


TABLE XCVI. 


Right Little Finger and Left Thumb. 


Right Little Finger. 


in soi 


W 


Totals 


91 
547 
767 
341 
254 


61—2 


477 


4 


( 


8 


Association of Finger-Prints 


TABLE XCVII. 
Right Little Finger and Left Index. 


Right Little Finger. 


pA uate ST | LL Wo} C€ | Totals | 
a 22 | 204 76 6 5 S130 .| 
3 8 477 | 269 44 35 833 | 
aa 0 76 | 155 34 17 282. 
2 1 73) |e 198e) a128 37 437 || 
o 0 40 66 16 13 seul 
4 eer | ‘ = | 
| Totals} 31 | 870 | 764 | 228 | 107 | 2000 | 
TABLE XCVIII. 
Right Inttle and Left Middle Fingers. 
Right Little Finger. 
5 ASM asia eee W | @ | Totals | 
ee 3 —— 
a A 17 5) OSes 2 3 215 
ie Ys 12 507-291 42 35 887 
sally ea) 0 152-301 71 32 556 
Sy) 2 al eo a4 26 240 
sl 0 14 48 29 11 102 
= | | | 
3 Totals | 31 870 | 764 | 228 | 107 | 2000 
TABLE XCIX. 
Right Inttle and Left Ring Fingers. | 
Right Little Finger. 
i SEE W C | Totals 
a | | 
= 50 0 1 66 
os 440 10 9 13 583 
op P 255 | 398 30 27 712 
= G0. | ST 78e leas 50 491 
pa 1 36 79 16 16 148 
2 
o | | 
=a) 31 870 | 764 | 928.0!) 107 


TABLE OC. 
Right Little and Left Little Fingers. 
Right Little Finger. 


2 | | 4 | sz | zz | w | @ | Totals 
2 3 : 
= A 18 17 Oy | 0 0 35 
3 SL 11 ol a6 21 30 959 
= LL 1 115 | 543 74 35 768 
= W 1 13 22 99 15 150 
4 C 0 4 23 34 27 88 
cs Ss | ee 
| Totals | 31 870 | 764 | 928 | 107 | 2000 
i 


ON THE PROBLEM OF SEXING OSTEOMETRIC 
-MATERIAL 


By KARL PEARSON, FE.RS. 


It is well known that anthropometric, particularly craniometric measurements 
give frequency series, which for moderate sized populations follow closely the normal 
or Laplace-Gaussian distribution. Measurements of stature, cubit head-length, 
cephalic index, etc., etc., obey with sufficient accuracy for most purposes of science 
the normal law. This statement may with a high degree of certainty be extended 
to practically almost all measurements on the adult skeleton. But a new difficulty 
arises in dealing with the parts of the skeleton: the sexing of the several bones of 
the human body is by no means certain, and this is especially the case when we 
come to deal—not with the cranium or the pelvis but with the long bones. In 
order to get over this difficulty, and to find the constants for each sex, it occurred 
to me some years back when the sexing of the long bones had presented this 
problem very forcibly to workers in my laboratory, that the method of my first 
contribution to the mathematical theory of evolution* might be applied. Namely, 
we might take the unsexed material and assume it to consist of a compound of 
male and female data, the frequency curve for each of these being normal ; the 
two components might then be found in the manner of the paper just referred to. 
The method was especially likely to be successful, when. the series was otherwise 
homogeneous, the numbers large and the character dealt with substantially diffe- 
rentiated sexually. Of course the method does not give the sex of each individual 
bone, but I have shown in another memoir+, how four to six characters thus 
resolved form a basis for determining the probable sex of each bone, and this with 
an accuracy which is very probably as great as, or even greater than, anatomical 
appreciation unbased on a system of numerical measurement. 


One of the few objections to the method is the labour involved in the process. 
While the analysis required in the application of the method is not so severe that 
it has not been applied in a large number of cases by workers in the Biometric 


* Phil. Trans. Vol. 185, A, pp. 71—110. 
+ To appear in the next number of this Journal. 
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Laboratory, it is still considerably beyond the powers of most of the present 
workers in anthropometry, and probably no anatomist of the present day has the 
mathematical knowledge requisite for the solution of the reducing nonic, or the 
arithmetical patience required for the calculation of its coefficients. It has occurred 
to me, however, that the work might be considerably shortened by the following 
considerations. The bones usually dealt with are those found in ancient cemeteries, 
in plague pits, clearance pits or crypts. It is probable, though by no means 
certain, that adult female bones in such cases would be rather more numerous 
than male. On the other hand being somewhat smaller they are asserted by some 
writers as likely to be more frequently broken, and they certainly may more readily 
escape preservation or measurement. If we take these two causes as counter- 
acting each other, we may assume as a first approximation that the numbers of 
male and female bones will be equal. In the next place it is a result of much 
anthropometric experience that male and female variations, i.e. their standard de- 
viations, are closely alike. These again we can take equal to a first approximation. 
Accordingly, to this first approximation, our osteometric series may be considered 
to consist of two equal normal components with different means. Let the mean 
of the unsexed material be J/, and let the actual means of the sexed components be 
M,, Mz, their standard deviations be o,, o., and their total frequencies 7, and n., 
where the subscript 1 refers, say, to the males, and 2 to the females. Then m,, 
Mz, 01, Fz, NM, and n, are the quantities we desire to discover. Let the moment- 
coefficients of the total material be, in the usual notation, p., Ms, Ms, Ms and let 
N(=7,+%:) be the total unsexed population. We shall write as customary 
Bi = bs?/s?, Bo = Mal oo’, Bs = Msfls/uo. Then, if our hypothesis be correct and the 
material consist very nearly of two equal normal distributions, 8, and ®; ought 
to be very small, while @, will be large in relation to them. 


It is convenient also to write: 


f, =4(38-8,), C= LOB — Bye aene: eae oe eee (1), 
m=M+m, DAU PPS Saerinang oper capo Wedoanoe: (ii), 
Jo = Yr'Po/ f2, Gy = (Gare a) fae eee eee eee (111), 

Ce AU RRR AR ALB GRHAB Me clin OnISHE GrO4090 5» (iv). 


Then the fundamental nonic may be written: 
: Keg 3 o.° 
= OR ahr 2 Bi ge® —3 (6, oar 5G?) qo ae (878, & Ste i) qe" 
+3 (482 — 30, 6-36") q+ 3 (Bi & — $ Bib") qe + 8B fq — Be =0...(v). 


VB. {8,— 66.0. 3& oe — sash 


Further: a= ana ee ae QA), 
where the sign of /8, is determined by that of ps. 
Again i EE ip i DON Se a (vii). 
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Lastly : Cie — fa + Go) — $ poly —$ Vion | 
os = py (1+ G2) — Malt — Vary | 
Equations (ii), (iii), (iv), (v), (vi), (vii) and (viii) form the complete solution of 
the problem when we make no approximations whatever*. 
If, however, 8, = 8; =0, then, the two components being equal, we have+: 
ese, 
VGA UU 0 Veit J tan sg Sais abs (ix). 
o= o=NV py {1 — Vo,}2 
It will be seen that it is needful in order that the solution may be real that 
should be positive or 8, < 3, 1.e. the total frequency should be platykurtic. Now 
let us suppose that the values given by (ix) are a first approximation and that we 
need a second approximation in which the two normal curves will be unequal in 
frequency, mean and standard deviation. Write: 
n=4N, y=VmG3, c=Vu(1—-VG)P ee (Gh: Ohiat 
and suppose : 
m=n+ 6n,, N= N+ONy, 
Voie eels) Oa) 
o,=0+ 60;, O,= +560, 
where the differentials represent small quantities of which the squares and 
products may be neglected to a second approximation. 
Our equations aret: 
m+n =N, 
MY + Neo = O, 
ny (y? ar o;”) + Nz (Yo? + O27) = Nps, 
M (or? + By, 077) + M2 (2 + 3y,0.?) = Nuss, 
1 (yi' + Gyo? + Bay!) + Ne (24 + Gy? oe? + 80,!) = Nyy, 
2 (91? + 10y2 oy? + L5y, 0,4) + ne (y2? + 10y.20.7 + L5y.o0!) = Vu. 
We now differentiate these and after differentiation put 


ie wW=-yY= ono. 
Hence we find: 

OMe ONEE Hla ciple ed Ae oe cae aii (x), 
@ (Oyi + OFs) 4 2yony= 0 cc vageseeecee stones pee (xi), 
Zny (Sy, — Oya) + Zo (Sa, + Soy) =O... eeeveceeeeceseee (xii), 
3n (Oy, + Sy.) (y? + a?) + 2dr (9? + 30°) + Brox (dc, — da.) = Nuys... ..(xili), 
y (y? + 80”) (Sy, — Syz) + 80 (9? + a”) (60, + 60.) =0 oe ee eee. (xiv), 

n (Oy; + Oy) (Sy! + 80 yo? + L504) + 28ny¥y (yt + 104?o? + 150%) 
+n (60, — 02) 20yo (y? + 8a?) = Nps .....c eee. (xv), 


* They are, in a somewhat better form, those originally given by me in Phil. Trans. Vol. 185, A, 
1894, pp. 71—110 ; see Nquations (14), (15), (18), (19), (27) and (29) of that memoir. 

+ Loc, cit. footnote, p. 91. 

iwioch cit. p: 182. 


482 On the Problem of Sexing Osteometric Material 


where it must be remembered that the differential terms are introduced solely to 
account for the asymmetry as represented by p; and y;, assumed to be zero to 
a first approximation. 

But (xii) and (xiv) show us that we must have : 


by, = Oy, 60, = —So,. 


Hence from (xi): On S = NOG | Y cecu eee a eee (xvi). 

(xil1) now becomes : 2ry? dy, + Garda, = ps, 

and (xv): Ary? dy, (9? + 507) + 20a (9 + 3807) da, = p;. 

Whence solving we find: 
ss ue) ao Ms 3 Ms i 
by; by 5 (1 +3 =) oT Sy oe (xv), 
~o\ Ms 1 ps 
60, = — ba. =-2(1452)% Sante ee (xvill) 
6n, = — 0m = — "Oy EAS MEMO ee EME ERR RIE rio 90 (xix) 
sy, 


These form together with (ix)’* the complete solution of the problem. 


The following example illustrates the procedure: 541 measurements were made 
of the bicondylar width of English femora, right and left, male and female being 
mixed. The frequency below resulted. 


Frequency Distribution of 541 Femora for Bicondylar Width. 


mm. Frequency mm, Frequency mm. Frequency 
61 | 1 71 23 
62 1 72 33°5 
63 1:5 73 25 
64 5 74 22 
65 13°5 75 36 
| 66 14 76 25°5 
67 15°5 Wh © 29°5 
68 22 78 32°5 
| 69 31 79 19°5 
| 70 19 80 33 


The constants of this distribution were : 
M=75'8152, 
jy = 37°692,112, jiy=— 2'587,693, 
ju, = 3020°898,695, u, = — 83-260,992. 
Hence we deduce: 
B,= 000,125,047, 8, = "000,106,750, 
Bo = 2°126,349, €, = °436,8255, 
¢, = 001,143,72. 
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Clearly 8, and ®; are so small that the distribution fulfils our condition of 
being very closely symmetrical. The nonic, equation (v) above, is: 
qs) — 3:057,789q.’ + 000,18757¢9.5 + 2°858,8174q,° 
— -009,8679,! — °754,678q.' — 002,50114," 
+ :000,000,0546q. — ‘000,000,000,005 = 0, 
the last two terms being written down to many figures to show their inappreciable- 


ness. The root required is: 
qo = — 65679, 
which by (vi) leads to: 


y? + 558,0507 — 24°755,802 = 0, 


and provides the solution : 


Females. Males. 
Mean: 70°547 mm. 80°526 mm. 
Total Frequency : 255°4 285°6 ee nen cron (A). 
Standard Deviation: 3°4842 mm. 3°6944 mm. | 
Modal Ordinate*: 29°24 30°84 


We have now to inquire how far the same result would be reached, if we had 
supposed as a first approximation equal Gaussian components and then proceeded 
to determine a second approximation by aid of (xvii) to (xix). 


Equations (ix) give us: 


ies — 1 — Oro: 

W=—y=y = 49912, 

Gy= o2=3'5750. 

Thus to a first approximation : 
Females. Males. 

Mean : 70°824 mm. 80°806 mm. | 
Total Frequency : 270°5 270°5 esca acre (B). 
Standard Deviation: 3°5750mm. 35750 mm. 
Modal Ordinate : 30°19 30°19 


(B), statistically speaking, is so close to (A) that it gives every confidence of 
a second approximation practically reproducing (A). 


We find: | 
Ms _ _ 020.8112 Ms _ _ .026,0386 
Cy oe ry ) a5 Ve ’ 
of ry 
2 Dj 
a= 519.0874) Y = 1:948,228. 
Ine o 


* nv 
Yo=——- of the normal curve. 
Hy v/ 210 
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Hence by (xvii) to (xix): 
SL es Sir MaRS SOLO), 
60,=—60,= oo X‘029,809= 1066, 
én, = — Ong = +n X 056,308 = 15°231. 
It will be seen from these results that: 


bo, _ 
o 


O% _ _ 0563, 0298, % 0568 
oY, 


may be considered fairly small quantities, and that they justify our assumption. 
We have accordingly: 


Females. Males. 
Mean: 70°543 mm. 80°525 mm. | 
Total Frequency : 255°27 285:73 > ¢.| WAR eee (C). 
Standard Deviation: 3°4684 mm. 3°6816 mm. 
Modal Ordinate : 29:36 30°96 


It is clear that the solutions (C) and (A) are for all practical purposes identical. 
Thus the short method is justified in the problem of sexing osteometric material. 
An improper extension of the method to material in which the sexes occur in very 
unequal groups may be guarded against by simply observing whether 8, and £, 
are very small quantities. 


In conclusion it may be desirable to compare the values of these sex-constants 
as found mathematically with sexing by anatomical appreciation. I owe an 
anatomical sexing of the same bones to my colleague, Dr Derry. 


The following values of the constants resulted : 


Females. Males. 
Mean: 70:098 mm. 79°764 mm, 
Total Frequency : 221 320 GRAB Kiso (D) 
Standard Deviation: 3°5148 mm. 4°1254 mm. 
Modal Ordinate : 24°55 30°95 | 


It will be seen that the mathematically deduced constants are not widely 
divergent from those obtained anatomically, but the accordance if fair is not ideal. 
The accompanying diagram exhibits the differences in the frequency distributions 
found by the two methods of sexing. The chief difference lies in the transfer by 
the anatomist of the larger female bones of the mathematical sexing to the male 
group. I do not propose to discuss here the relative advantages of the two 
methods, but would draw attention to a few points of interest : 


(i) The svlution (D) makes no appeal to measurement in the sexing, it is 
based purely on an anatomical appreciation. It would therefore be subject to 
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personal equation, depending on the features upon which the experience of the 
individual anatomist leads him to lay most stress. The solution (C) is unique, 
that is to say, given the same data, all statisticians would reach the same values, of 


Frequency Distributions of Bicondylar Width in Male and Female Femora 
sexed by Anatomical Appreciation. 


mm. ie) 3 mm. f°) 3 mm. f°) 3 
61 1 = 71 20°5 2°5 81 = 28 
62 1 = 72 29°5 4 82 — 23 
63 15 = 73 16 9 83 = 19 
64 5 — 74 115 10°5 84 1 16°5 
65 13°5 = 75 10 26 85 — 19°5 
66 14 = 76 7 18:5 86 _ 16° 
67 15°5 77 2°5 27 87 = 75 
68 22 — 78 I 31°5 88 —_— 3 
69 31 — 79 1 18°5 89 — 3°5 
70 15°5 3°5 80 1 32 90 — 0°5 


course apart from errors in arithmetic or from the number of decimal places 
retained in the working. It eliminates the factor of personal equation. 


(ai) (C) would, however, be influenced by the fact that our material is not 
perfectly homogeneous except for sex ; because (a) there is a mixture of right and 
left bones, and, to judge by the anatomical sexing, this may involve a difference of 
‘7 to ‘9 mm. in the means and ‘08 to ‘24mm. in the standard deviations; this 
would add to the heterogeneity, (b) our bones may be due to somewhat mixed 
classes and possibly mixed periods, (c) the bicondylar width is liable to be injured 
by rough treatment of the bone, and this injury will most affect the weaker, and 
therefore probably the younger, bones. These bones might then be treated as female, 
a classification which most anatomical sexing also favours. While the total number 
of these London femora is nearly 800, the bicondylar width could only be measured 
in 541 cases. This selection will not necessarily be random as to size or sex, 
and may modify our constants found mathematically from the distribution. On 
the other hand it would affect also the anatomical appreciation of sex, but only 
in as far as it was based on the size of the condyles. 


(iii) We know from very considerable sexed data that the variation of man 
and woman is very nearly the same. The coefficients of variation measured in the 
usual way, i.e. by 100 standard deviation divided by mean, gave: 


Mathematical Sexing. Anatomical Sexing. 
9 492 ff 457 SOL my 
I= G5) A=- ‘16 
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There was thus closer sexual accord from the anatomical method. But when 
the same anatomical sexing was applied to the character of the head of the femur 
in the vertical plane, I found for right bones: 


Q 5:05 J 637 A =— 1°32, 
and for left bones: 
9 4:91 PAY) A=-1°19, 


differences far greater than occur in the mathematical sexing from the bicondylar 
widths. Accordingly no great stress can be laid on inequalities in the coefficients 
of variation deduced from either process of sexing. 


It. would appear to me that we have reached on the whole a reasonable 
biometric method of sexing. To what extent it can replace the sexing by 
anatomical appreciation must be left to the future. But it is clear that when 
anatomists themselves prefer to that appreciation an appeal to a single character, 
e.g. to the measurement of the femoral head, and only settle by anatomical appre- 
ciation the sex of femora with diameters between 45 and 47 mm., then they do 
not show much confidence in their own method of sexing. An interesting experi- 
ment could be made if some 400 to 500 sexed bones were available, and then, 
without knowledge of the real sex, two or three anatomists and a statistician were 
to be asked independently to determine the mean and variability of two or three 
characters of the bones of each sex in this material. 


I have cordially to acknowledge the help of my colleague Mr E. Soper in the 
determination of equations (xvii)—(xix) and in their solution (C) in the numerical 
case for which I had reached the solution (A); also the labour of my colleague 
Miss H. Gertrude Jones in the preparation of the diagram which contrasts 
graphically the mathematical and anatomical solutions of the problem. 


FURTHER EVIDENCE OF NATURAL SELECTION 
IN MAN. 


By ETHEL M. ELDERTON, Galton Research Fellow, 
AND KARL PEARSON, F.R:S. 


(1) ‘The second author of the present paper writing in 1894 a commentary on 
the statement that “no man, as far as we know, has ever seen natural selection at 
work,” remarked : “ Every man who has lived through a hard winter, every man 
who has examined a mortality table, every man who has studied the history of 
nations has probably seen natural selection at work*.” The emphasis is here to 
be laid on the word “ probably,” because the seeing depends on the power and 
validity of the scientific means adopted to analyse the observed facts. In a paper 
communicated by the same author to the Royal Society in June 1912+, it was 
shown from the Registrar-General’s series of ten yearly life-tables that when 
allowance was made for change of environment in the course of the fifty years a 
very high association existed between the deaths in the first year of life and 
the deaths in childhood (1 to 5 years). This association was such that if the 
infantile deathrate increased by 10°/, the child deathrate decreased by 5°3°/, in 
males, while in females the fall in the child deathrate was almost 1°/, for every 
rise of 1°/, in the infantile deathrate. The method of investigating by life-tables 
could not be extended beyond 1900, because the life-tables for the next ten 
years (1901-1910) were not then out, and indeed have only just appeared 
(December 1914). While the infantile deathrate as shown from the life-tables 
had risen from 1871-1900, the child deathrate had fallen for the same period. 
During the next decade 1900-1910 both deathrates have fallen together; such 
a secular change does not in any way modify the argument of the paper, which 
lies in the statement that whether two deathrates rise together or rise and fall 
simultaneously we can draw no inferences at all, wntil they have been corrected for 
secular change. Most economic, demographic and physical variates are changing 
continuously with time, and no comparison of time graphs or calculation of . 
correlations will demonstrate of necessity anything but spurious association, until 


* The Chances of Death and other Studies in Evolution, Vol. 1. p. 166. 
+ ‘“The Intensity of Natural Selection in Man.” R. S. Proc. B. Vol. 85, pp. 469—476. 
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the time factor has been eliminated. It is the deviations from the continuous 
curves of secular change which may turn out on careful analysis to be truly 
indicative of causal relationship between the variates under consideration. 


The first attempt to get rid of secular change by a method of differences was 
made by Miss F. E. Cave in 1904 in a paper on barometric correlations*, and 
shortly afterwards Mr R. H. Hooker published a paper dealing with the same 
pointt. Both these authors used only first differences and gave no general theory 
of the method. Quite recently “Student” has published a paper{ giving the 
fundamental formulae, and indicating how by taking successive differences of two 
variates and correlating them, we free ourselves from the time or locality influence, 
and approach the true and probably causal relationship between them. When the 
correlation of the differences becomes steady, then we have reached the actual 
correlation of the variates corrected for the time factor, provided an assumption is 
made which we shall discuss at greater length below: see footnote, p. 495. Mean- 
while Dr Anderson of Petrograd has been working on the subject, and in a 
most valuable memoir§ he has added to “Student’s” results a number of new 
theorems; for example, the probable errors of the successive difference corre- 
lations when they become steady, and the relations which should be fulfilled 
between the squares of the standard deviations of successive differences, when 
the series has become steady. We have thus a double means of ascertaining 
whether the desired object—the elimination of the time-factor—has been approxi- 
mately achieved. A third additional test will be indicated in this paper. 


This new statistical process has been termed the Variate Difference Correlation 
Method||, and there is small doubt that it is the most important contribution to 
the apparatus of statistical research which has been made for a number of years 
past. Its field of application to physical problems alone seems inexhaustible. We 
are no longer limited to the method of partial correlation, nor compelled to seek 
for factors which rendered constant will remove the changing influence of environ- 
ment. In the present case, that of the influence of imfantile mortality on child 
mortality, Pearson endeavoured to eliminate the influence of continual environ- 
mental improvement by making the expectation of life at six years constant. 
Snow achieved the same object by correlating the deathrates of one sex for a 
constant deathrate of the other**. In both these cases substantial evidence of 
Natural Selection was obtained from the mortality tables. The object of the 
present paper is to demonstrate by the still more complete elimination of the 

* R. S. Proc. Vol. uxxiv. pp. 407 et seq. 

+ Royal Statistical Society Journal, Vol. uxvitt. pp. 396 et seq. 1905. 

+ Biometrika, Vol. x. pp. 179, 180. 

§ Ibid. pp. 269—279. 

|| Pearson and Cave: ‘‘ Numerical Illustrations of the Variate Difference Correlation Method.” 
Biometrika, Vol. x. pp. 340—355. 

| R. S. Proc. B. Vol. 85, p. 472. 


** «The Intensity of Natural Selection in Man.” Drapers’ Company Research Memoirs, Dulau & Co., 
1911. 
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time factor involved in the variate difference correlation method that a selective 
deathrate plays even in highly civilised states a marked part in the natural history 
of man. 


(2) The material dealt with in this investigation consists of the Registrar- 
General’s returns for births in England and Wales and of deaths in the first five 
years of life from 1859 to 1908 with the addition of as many years before 1859 
as were requisite to make our highest differences fifty in number, and with the 
addition of as many years after 1908 as were requisite for following up the births 
of that year to the fifth year of life. Thus actually our data extended from 1850 
to 1912. The reason for this procedure lies in the desirability of using a constant 
population, and not reducing by one a relatively small number like 50 on each 
differencing. Asa result of this process we had to modify Dr Anderson’s values 
for the probable errors for the steady values of the difference correlations because 
‘In our case the size of the population does not change as we proceed to higher 
differences*. The second cause which requires extension of the data is a very 
important one, and must be illustrated numerically. Consider the table: 


Deaths of those born in a given year. 


7 Female ; 2 
Year Births 0—1 1—2 2—88 3—4 4—5 


1908 | 478,410 | 63,594 = = = 
1909 — -_ 14,146 es eas 
1910 = os = 5,020 = 
1911 = = = = 3,449 
1912 a =i = == = 


Now the deaths of infants 0O—1 in 1908 are not necessarily of infants all born 
in 1908, but the total deaths 63,594 must represent closely the deaths in the 
478,410 infants born in that year. Disregarding immigration and emigration, this 
gives a deathrate per 1000 of 107-495 and leaves 414,816 children alive. Of this 
group 14,146 may be taken to die in the second year of life, giving a deathrate of 
31-990 per mille. There remain 400,670 children who reach the third year of life 
in 1910, of whom 5,020 die, giving a deathrate of 11:939, and 395,650 survivors. 
These survivors are followed into 1911 and 1912 in the same manner, and thus 
we obtain approximately the deathrate up to the fifth year of the male children 
born in 1908. We thus in bulk follow the same group of children through the 
first five years of life. Tables I and II give the deathrates for males and females 
respectively under the heading of the birth year of each group. These death- 
rates have been taken to three decimals places for the purpose of determining the 
higher differences correctly to one decimal place. The successive differences of 


* All the probable errors of the difference correlations given in this memoir are these modified 
Andersonian values, i.e. they are the probable errors on the assumption that the difference correlations 
have reached steady values. 
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TABLE I. 


Deathrates in each Year of Life for groups born 


in the Year of 1st column. 


_ Males. 
| 

Ga 1—2 2—3 oy I—5 
1850 159°781 63°935 34°092 22°138 19°021 
1 168°706 64:°977 33°882 26°657 18-209 
2 173°324 63°533 40°936 24-316 16°022 
3 174°808 74°802 35°464 22°298 16°720 
4 170°890 60°598 30°740 21°688 21°329 
5 169°151 59°696 33004 29°546 19°780 
6 156°756 64°317 39°551 25°594 13°751 
tr 168°486 67°928 36°777 19°471 13°923 
8 172°591 68°712 30°566 19°857 16°268 
9 167°106 58°887 31°650 22°325 21°962 
1860 162°642 70°401 35 °297 30°083 21°325 
1 167°634 65°785 41°390 27°654 | 16°236 
4 156°684 73°848 37°326 22°0138 | 14°982 
3 163°183 67°537 32°774 21-088 12°552 
4 166°309 66°462 34°408 17°660 15°991 
5 174°356 68°369 28°278 21°473 17°015 
6 173°659 60°603 32°159 22-751 | 18°105 
¢f 166°905 63°790 32°731 23°220- | 16°085 
8 168-064 62-988 32°357 20°185 | 12°821 
9 169°022 65°716 30°186 17-450 | 11°624 
1870 174°287 62°401 26°760 16°344 | 16°052 
1 171°840 59°853 25503 21°635 | 14°727 
2 162°321 99 °267 30°754 18°731 | 12°655 
3 163676 61°415 28°007 17°275 12°502 
4 164:976 60°316 ZO Be, 16°028 13°499 
5 173°145 58°422 25°400 18°848 13°187 
6 160°415 56°088 27°992 177044 13°2638 
tf 149 °627 63°344 26°104 17171 11°738 
8 166°266 58°104 27°853 15°280 13°049 
og) 149°754 66°188 22-245 16°741 12°115 
1880 167°313 48°181 25°909 16°036 11°920 
1 142°532 59-164 23°333 15°261 10°430 
2 153°154 54°365 24°026 14°362 9°416 
3 151°184 58°292 23°015 13°823 11°042 
4 160°381 53°952 22°077 14°792 9°533 
5 151°175 57°960 23°052 13°615 10°073 
6 163°081 55°611 21:009 13:°974 10°569 
th 158°243 50°713 22°169 14°394 9°894 
8 150°177 56°882 22958 13°536 10°150 
9 157°476 56°279 22°338 13°750 10°647 
1890 164°757 59:098 22°604 14°551 9°623 
1 163°761 54°255 20°821 12°615 8°757 
2 162°112 52°776 19°442 12°497 10°604 
3 173°333 47-035 20°343 14°121 8°546 
4 149°633 55°787 21°271 11°750 8:439 
5 176°280 51°404 19°535 11-912 8:973 
6 160°989 50°293 18°742 11°802 9°366 
ih 170°291 50°986 19°124 12°396 8°608 
8 175°183 48°227 19°380 11°496 8°657 
9g 176°606 49°837 17-094 11°565 7°165 
1900 168°685 44°728 17-400 9°759 7°553 
il 165°617 42°597 15°570 10°241 7°135 
2 146°791 40°581 16°572 9°587 6°853 
3 144°567 45°517 15°368 9-260 6949 
4 158 °684 38921 15°234 10°194 6°458 
5 141°1938 39°326 15°702 8°893 6'886 
6 144°819 38 °037 14°222 8°858 5°655 
tf 130°259 36°615 15°159 7643 6°230 
1908 132°928 34°102 12°529 Salis 5°962 
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TABLE II. Deathrates in each Year of Life for groups born 


in the Year of 1st column. 


Females. 
Oxsiely apes) 2-8 gay paws 
1850 130°477 | 62°235 34°147 22625 19°278 
i 138°306 62°106 33°992 27:°087 16°762 
2 142°185 61°812 39°787 23°805 16°148 
3. 144°270 | 71:°939 35°186 22686 16°948 
4 Pa 52 59024 30°863 | 22°226 21°907 
Oo} 3218 se eo elA|: 34°057 29°375 20°501 


i 


129°877 61-902 39°758 | 26°146 14°305 
142°203 65°853 36°712 19-990 14°391 
é 143°595 64°513 29°716 20-796 17-089 
9 138°477 55°5385 32°024 24°485 21°454 
1860 131°914 66°902 36060 29-941 20°765 


“et 


1 137 °339 61°860 41°243 27°727 16°287 
2 127°203 70°313 36°543 21-905 15°398 
3 133°321 63°438 32°637 21°623 12°702 
4 138°232 63°395 34°846 18°217 15°547 
5 145°388 66°401 28°484 21°437 17°145 
6 144°879 58°180 32°392 23°086 17°536 
7 137°712 61°641 33°243 23°081 15°653 
8 141°756 58624 31°892 20-060 12°524 
9 141°450 60°720 31004 18-108 11°732 
1LS70 144°596 59°845 26°855 16°235 15°196 
1 143°353 56°379 25°062 21°493 13°903 


| 2 1367453 52°652 30°2138 18°454 12°571 


3 134°079 57°739 27°549 16°950 11°421 

4 135°939 o7°795 25°498 16-031 13°253 

5 142°730 54°032 24°563 18°848 12°833 

6 131°718 51°057 27976 16°685 12°651 

i 121°936 59°5385 25°135 17441 11°214 

& 137°958 52°463 27 °646 15°186 12°491 

9 120°6438 62°324 21°792 16°911 11°880 
1850 137°691 45620 25°319 15°556 11°335 
1 117°221 55413 22°443 15°359 10°396 

2 127°627 50°057 23°539 14°178 9°586 

3 122°766 54°048 22°948 13°643 10°606 

4 132°701 49-967 21°259 14°913 9°40] 

i) 123°667 53°415 22°740 13°251 10°030 

6 134°814 51°108 19°985 13°800 10°295 

i 130°689 46-402 21°538 14°594 10°012 

8 122°312 53°369 22°456 13°926 10°146 

) 1297148 53°307 21°783 14°170 10°569 
1890 135°839 54°850 21°607 14°731 9°478 
1 132°763 51°567 20°841 12°646 8°900 

2 132°414 49 °387 19°174 12°708 10°370 

3 143°346 44°511 19°842 14°293 8°643 

4 123°522 52°267 21°499 12°183 8°300 

5 144°326 | 49°339 18°643 11°817 8°499 

6 133°535 | 46°691 18°439 12°184 9°282 

tf 140°755 47°998 18°109 12°739 8642 

& 145-001 44-832 18°436 11-490 8°865 

@) 148-000 45928 16°954 11°724 7164 
1900 139°148 | 41-901 16°902 9°712 7611 
e 1 136°346 39°527 15°156 10°606 77184 
2 118°479 37 :064 16°168 9°891 6°792 

3 118-004 42-270 14°774 9°315 7615 

4 131°477 -36°598 14°136 9°793 6°453 

5 114°641 37-084 15-066 8-710 6-902 

6 119-668 36°006 13°552 9°168 5°513 

@ 104°487 33904 13°789 7493 6°071 


1908 107°495 31°990 11:939 8°546 5°939 
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these deathrates up to the sixth and, in a few cases, to the tenth were then 
formed. In our notation m, is the deathrate in the rth year of life, 1e. from r— 1 
to r years of age, and 6,m, is the sth difference of this deathrate. As we have five 
deathrates for each sex this involves 10 means, 10 standard deviations and 20 corre- 
lation coefficients, but as we have used six successive differences these numbers 
must be multiplied by seven. The calculation of these differences and of upwards 
of 150 correlation coefficients has meant very strenuous labour. It must, indeed, be 
admitted that the application of the variate difference correlation method is not, 
even with small populations, a light task, but the change from the high positive 
to low negative and then to high negative values of the correlation is of 
extraordinary interest, and indicates the stages by which the associations are 
freed from the spurious influence of the time-factor. 


(3) All our correlations are given in Table III (p. 497), but it is desirable to 
discuss in detail certain groups of them. We take first the correlations of the 
deathrates in successive years. They are: 


Male. iMemates 
Mn Ms + 398 + 080 + 390 +081 
Tg my +859 + 025 + 864 + 024 
Pe +924 + 014 4-928 + 013 
i + 911 + 016 +917 + 015 


my Ms 


All these are positive, all are significant and, the first excepted, are very high 
correlations. There is no significant difference between male and female. The 
least important is the relation between deaths in infancy and deaths in the first 
year of childhood. We have in these correlation coefficients the numerical 
expression of what is obvious in Tables I and II, 1. as the deathrate in any year 
of age falls so does the deathrate of the same group in the following year. It is 
this fact which has led to the erroneous idea that natural selection plays no part 
inman. The fact, however, simply expresses the continuous change of environ- 
ment which has been in progress since 1860. During the half-century improved 
economic conditions, bettered sanitation, and developed medical care have lowered 
the deathrate at each age*. It is therefore impossible to deduce any argument 
as to natural selection in man from these correlations until we have removed this 
continuous influence of the time-factor. This is achieved by the variate difference 
correlation method. In every case a preliminary examination of Tables I and II 
shows that the correlation of the first differences of the deathrates of successive 
years is negative, and as we take higher and higher differences the intensity of this 
negative correlation increases, until with the sixth differences it reaches to the 


* As we have already remarked the infantile deathrate showed little of this improvement till 1905, 

It was about this same year that the absolute number of births in England and Wales began to decline, 
so that while the population has increased by something like 34 millions, that population produces 
about 76,000 fewer babies annually. 
. 63—2 
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very substantial value of about —°7. In other words a rise in the deathrate of 
one year of life means a fall in the deathrate of the following year of a most 
marked kind. While with the sixth differences we are approaching fairly closely 
steady values it may be doubted whether we have reached them in any case but 
that of 154m. dgm;:_ Lhe following are the sixth difference correlations in the case 
of the deathrates of successive years: 


Male. Female. 
Tagmy . Some — 688 + 090 —-719 + 081 
"55 my « 5gm3 — 673 + 092 — 660 + 095 
User tasene: — "703 + 085 — ‘731 +:078 
T Samy « 5g Ms — 695 + ‘087 — 736 +077 


Again the male and female results are in excellent agreement, and we grasp 
the startling manner in which the new method reverses a judgment based on- 
relations which have been deduced without any regard to secular change. 


(4) The question naturally arises: How far are these the “steady” values of 
the difference correlations measuring the organic relation apart from the time- 
factor of the deathrates in different years of infancy and childhood ? 


There are three fundamental tests: (i) The correlation coefficients of suc- 
cessive differences should have ceased to be markedly rising or falling. Table II 
(p. 497) shows that this is approximately but not absolutely the case, but we have 
reached a stage in which any further changes are certainly of the order of the — 
probable errors and thus of little significance. The unsteadiness as will be in- 
dicated later in better tests is greatest in the differences of the deathrates in the 
first and second years of life. Here the correlations were taken to the seventh 
and eighth differences and gave: 


Male. Female. 
Lae ee — 696 + 090 — 729 + -082 
Tsai . dams — 692 + 094 is ‘731 4 °084 


which appear to have reached practical steadiness. Actually the final correlations 
must be somewhat greater than those obtained from the sixth differences. To 
push the process further, however, would be of small advantage because higher 
differences involve introducing earlier data, and the birthrate data before 1855 
become more and more unreliable. Again in the extremely high differences, the 
additional year required for an additional difference if not appertaining to rela- 
tively smooth data may in itself, when we have only a small total frequency of 50, 
produce a certain amount of unsteadiness. 


(ii) We may consider the mean values of the differences. 
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If our first variable be taken* as a= ¢,(t) + X, where X is the intrinsic value 
of w as apart from the time change, then mean 6,4,” after steadiness has set in is 


* One of the bases of the variate difference correlation method lies in the assumption that the 
intrinsic variation is superposed on a secular change of a continuous character; the causes which 
determined the intrinsic variation X are supposed to be sensibly independent of the time for the 
period under consideration. We conceive the secular change as given by a parabola, say, of the 
sth order, but the deviations from this curve are supposed in magnitude and sense to be independent 
of the time, i.e. due to chance causes which are the same in 1850 as in 1900. This assumption 
is an important one and must lead to our seeking relatively short periods consistent with a numerical 
frequency sufficient for significance. It can be roughly tested, of course, by considering ox as found 
from, say, the first and second halves of our observations. In our own case we found: 


Values of ox deduced from Sixth Differences for 1st 25, for 2nd 25, 
and for all 50 years. 


(m1) | (mz) (ms) (mg) (m5) 
a 
= | 
SoMa el ieee. lacs: igual ae: GO malh aes a) 1 
é | seat Ds 
Ist 25 years ... | 7°32 | 6-94 | 5-51 | 5-61 | 2-09 | 2-30 | 1°52 | 1:67 | 1:05 | 0-91 | 
All 50 years ... | 8-61 | 7-83 | 4:71 | 4:63 | 1:59 | 1:77 | 1:17 | 1:28 | 0°86 | 0°78 
2nd 25 years ... | 9°70 | 8°61 | 3°73 | 3:37 | 0°83 | 0-98 | 0°66 | 0-68 | 0°63 | 0°62 
I | 


These values are less steady than we had originally hoped for. Clearly the variability of the X 
portion of the infantile deathrate has grown greater, and that of the four child deathrates has grown 
sensibly smaller with the time. The fundamental hypothesis of the variate difference method is there- 
fore only approximately true for this material, We have made some investigations on the assumption 
that x= ¢,(t) + (a+ bt) X, but the values of a and b obtained were by no means satisfactory. We have 
in hand a further investigation of the problem by the method, originally suggested by one of us, before 
the difference method was started; namely to subtract from « the value obtained by the best fitting 
parabola of the sth order in the time and so to reach the actual values of X. The relation of these 
to the time can then be found with some degree of accuracy. To the male deathrates of the second and 
fourth years of life we applied parabolae of the third order in the time, and obtained excellent fits ; we 
then subtracted the ordinates of these parabolae from the deathrates and correlated the remainders, 
dy and dy say. We found Wiehe +°312+-088, a value corresponding more nearly with "551mg d5neg than 


T Sm dgmy? and indicating that we might more rapidly approach final values by this method than by 


that of variate differences. But the fitting of high order parabolae is very laborious; at the same time 
the graphs give excellent tests of the accuracy of the work, and we obtain the actual values of what 
we have termed X and Y, as represented by dy and dy. We then correlated the numerical value of ds 
with the time and found "dt = —°284+-089. It is clear that with correlations of this order with the 


time, "dod, would not be modified by the extent of its probable error if we found the partial corre- 
lation tdydy? or corrected the correlation of dy and d, for the time. There is another point, however, 


which justifies us in disregarding this variation of X and Y with the time as of secondary importance. 
The correlation of X with the time is positive in the first year’s mortality and negative in the following 
four years; thus while it would certainly tend to give a negative value to 7x, for the 1st and 2nd 
-years of life, it would tend to give a positive value to the correlation for all successive pairs of years 
beyond the 1st and 2nd. Now all such successive pairs of years have high negative values, which are 
therefore minimum values, but these values are all in excellent agreement—roughly equal to —-7—with 
that found for the 1st and 2nd years of life. We therefore concluded that the influence of the time on 
the deviations from the secular curve of change, although very sensible, is of no substantial importance 
for he correlations. 


496 Further Evidence of Natural Selection in Man 


equal to mean 6,,,X, and this (taking, as we have done, ‘backward’ differences) 
is given (the C’s being the usual binomial coefficients) by 


= (X= Oa Cs Kg =) a Ae ee Oe Xp s09)) 


MN 


Now if we remember that the X’s have chance values uncorrelated with each 
other then we shall have for the squared standard deviation of the mean 6,4,X, 


9 
mean 6,.,,;X — 


2c,? ( + CO? ae 1 Ce? + a0 + +C,?) 


o 
n> 


Or, the probable error of the mean (7 + 1)th difference after the steady values 


have been reached 
2 = 
= 67449 / 2122 ox 
r[r 


At first sight this appears of no value, because oy is unknown, but Dr Anderson 
has ‘given os __,, in terms of ox when steady values have been reached *, 1.e. 
Rat Le 


when we assume steadiness reached. 


The values of the means of the differences with their probable errors on the 
assumption of steadiness are given in Table IV, and the ratio of the means to 
their differences in Table V. 


It will be seen that the positive and negative signs are not scattered quite as 
much at random as we might have hoped and that this is especially the case in 
the infantile mortality differences}. If we take all the ratios of the means to 
their probable errors except the first difference, we find their average value 1°16 ; 
it should be of course 1:18. Of these ratios 33 are positive and 25 negative. If we 
omit the ratios for the first year of life, we find 24 negative and 20 positive, while 
the mean value ="98 as against 1:18, the theoretical ratio of the mean to its 
probable error. It is obvious that the infantile mortality differences are those 
which are anomalous. Otherwise the mean differences vary fairly satisfactorily 


* Biometrika, Vol. x. p. 272. 

+ It may be noted that at the beginning of the period we have the disturbing influence of war and 
at the end of the period wholly changed conditions due to a great limitation of births. The means 
depend on differences of mortality under these conditions. 
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round zero in the required manner. The interest of this test is that we see that 
the bulk of the time effect has been removed even when we reach the second 
difference, a result confirmed by the fact that the correlation of the deathrates’ 
second differences is in every case already substantially negative. 


(iui) A third set of tests are those which are based on the standard deviations 
of the differences. In the first place if we assume steadiness to have set in, we 
can calculate oy, the intrinsic standard deviation from the known value of 5 x» bY 
means of Dr Anderson’s formula cited above (p. 496). Table VI gives the intrinsic 
values of ox, i.e. oy as deduced from the variability of the differences. It will be 
at once observed that for the third difference the mortality ratios of the third, 
fourth and fifth years of life reach steady standard deviations. In the case of the 
first year of life it is not till the eighth difference that this result is reached, while 
in the case of the second year, it can hardly be said to have been obtained with 
the ninth difference. A distinction should be noted here of which the exact 
physical significance is not obvious to us. In the second, third and fourth years 
the intrinsic standard deviations fall to steady values, but in the first and second 
years they rise towards those values and these are just the cases where steady 
values are not absolutely reached. 


TABLE VI. 
Intrinsic Standard Deviations (ox). 
Year: O—1 (m;) | Year: 1—2 (mg) | Year: 2—3 (m3) | Year: 3—4 (my) | Year: 4—5 (ms) 
Order of 
Difference | 
3 ¥. 3 ? CRAs se 3 4 3 a 
2 ae = = 
Ist 7°62 6°96 3°90 3°86 1°75 1°83 1°53 1°52 1°23 1:09 
2nd 7°89 7:22 4°13 4:09 1°67 1°81 1:29 1°34 1:00 83 
3rd 8:14 7°45 4°32 4°28 1°62 e728 heal 1:29 93 80 
4th 8°34 7°63 4°47 4°42 1°59 1°76 1°18 1:29 87 77 
5th 8°50 7°76 4°60 4:54 1°58 1°76 1:17 1:28 86 77 
6th 8°61 7°83 4°71 4°63 1°59 Ea) erly 1:28 86 78 
7th 8°66 7°84 4°80 4°72 _- —- ) — — — = 
8th 8°68 7°85 4°88 4°78 — = 
9th — — 4:97 4°82 — | — — — — = 


(iv) There is another test for the standard deviations of the differences 
deduced by Cave and Pearson from the Andersonian results and used by them in 
their memoir on Italian Index Values*, namely as steadiness is approached the 
ratio of the squares of standard deviation of successive differences should approach 
closer and closer to 4, the exact value being 

o"5.m 2 


4-2, 
s 


2 
o's ym 


s 


* Biometrika, Vol, x. p. 346. 
Biometrika x 64 
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Table VIL shows how rapidly the system approximates to the theoretical values 
in the case of the higher differences. 

On the basis of all the tests we have applied we may, we think, conclude that 
by the sixth difference we have reached values for the correlation of deathrates 
in successive years which are in all probability close to the organic or intrinsic 
values. Only in the first and second years of life is steadiness not absolutely 
reached, but for practical purposes but little change can be anticipated in the 
correlation coefficients. 


TABLE VIL. 
Ratio of Squared Standard Deviations. 


my my ms ma Ms Mean | Mean | Theory 


3 : 3 f 3 ¢ ) f 3 f 6 |S 


i 
| 


vl 949) 956] °354] °369] +142] °144|] +194) +192] -211] -181 | °370| ‘368) 2 
2 | 3:199 | 3:221 | 3°374 | 3°384 | 2°731 | 2°934 | 2°127 | 2°317 | 1-996 | 1°728 | 2°685 | 2°717 3 
8 | 3°547 | 3-552 | 3638 | 3°633 | 3-133 | 3-240 | 2-944 | 3-086 | 2°896 | 3:109 | 3°232 | 3-324] 3°333 
4 | 3676 | 3673 | 3°754 | 3°741 | 3°363 | 3°428 | 3°341 | 3°504 | 3:054 | 3°227 | 3°438 | 3°515 | 3°500 
5 | 3°738 | 3°723 | 3°811 | 3°793 | 3°591 | 3°604 | 3-509 | 3-562 | 3:504 | 3°641 | 3°631 | 3°665 | 3°600 
6 | 3°756 3°848 | 3°828 | 3°690 | 3°683 | 3°708 | 3-648 | 3-691 | 3-758 | 3-739 | 3°730| 3°667 


3734 
| | 


(5) We can look at the association of deathrates in successive years from 
another standpoint. We can ask if there be an increase of 10 points in the 
deathrate for a given year, what increase or decrease will there be of deathrate 
in the same group in the following year ? 


In Table VIII below the second column gives the spurious change which is 
apparent in the crude data, the third column gives the real organic change which 
is discovered when the time-factor is removed. 

TABLE, VIII. 
Association of Deathrates without and with Annulment of Time-factor. 


Result of an increase of- 10 deaths per mille in one year of life on the deaths per 
mille in the next year. 


| Disregarding Time-factor Annulling Time-factor 


Increase of 10 in Deathrate of | — 


) ? 3 


| 
di) 


Ist Year on 2nd Year 
2nd Year on 3rd Year 
3rd Year on 4th Year 
4th Year on 5th Year 


| Increase 3°3 
Increase 671 
Increase 69 
Increase 7:0 


Increase 3'8 
Increase 6°6 
Increase 6°7 
Increase 6°8 


Decrease 3°7 
Decrease 2°2 
Decrease 5‘2 


Decrease 5:1 


Decrease 4:3 

Decrease 2°5 , 
Decrease 5:3 | 
Decrease 4°5 | 
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It is easy to see how those who contented themselves with crude deathrates, 
making no allowance for the betterment of deathrates with the time, interpreted 
a higher deathrate in one year to mean a higher deathrate in the next year of life, 
and so questioned whether natural selection applied to civilised man. As a 
matter of fact we see that the true organic relationship of deathrates is much 
more probably summed up in the statement that a decrease or an increase of 
deathrate in one year of infancy or childhood is in each case followed by an 
increase or a decrease in the deathrate of the survivors of the same group in the 
following year. Disregarding the time-factor we have a result quite incompatible 
with natural selection; annulling the time-factor, we have a result not only 
compatible with natural selection, but very difficult of any other interpretation 
than that of a selective deathrate, 1.e. a heavy mortality means a selection of the 
weaker members, and the exposure to risk in the following year of a selected 
or stronger population, which has accordingly a lesser deathrate. 


(6) We now turn to the problem of how far this influence extends, or 
probably it would be better to phrase it: how far this influence can be traced. 
It is not only that the age group we follow does not absolutely consist of the 
same individuals but even with those members that are the same there is very 
often change of environment due not to time but to a change of locality or 
of economic condition affecting individuals. Added to this there is a continuous 
immigration and emigration. But beyond these causes weakening the association, 
there is another difficulty of great importance arising from what has happened 
in the intervening years. We wish to find out how an increase of deathrate 
in the sth year of life affects the deathrate in the (s+2)th year of life, but 
the events in the (s+1)th year will largely dominate and, perhaps, screen the 
results we are seeking. Such problems are always arising in statistical research. 
For example, a child may resemble its grandfather simply because both grand- 
father and child are like the child’s father. We know that the problem is 
answered statistically by inquiring what is the relation between a character in 
the child and the grandparent for a constant value of the character in the parent. 
In precisely the same manner we must in the present problem inquire: What 
is the correlation between the deathrates in the sth and (s+ 2)th year of life 
for constant deathrate in the (s+1)th year of life ? 


TABLE IX. 


Influence of Natural Selection at Interval of Two Years. 


Partial Correlation of For constant 3 Q 

dom, and dgm3 sis... gig | — °4307 | — 5242 
Ooms, and Sgmq dg73 —'2555 — 2058 
Somz and dgm, —... Ogig — ‘1798 | — 3129 
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We shall of course work with the sixth difference correlations in order to free 
ourselves substantially from the time-factor. 


Here again the judgment based on the partial correlation of the crude 
deathrates is in all six cases reversed. For every one of the partial coefficients 
of crude deathrates shows that for intervening year with a constant deathrate, 
an increase of deathrate in the earlier year means an increase, not a decrease in 


the later year. Actually an increase in the one year is shown in Table X in all 
cases to be followed by a decrease at two years’ interval. 


TABLE X. 
Influence of Natural Selection at Interval of Two Years. 


Result of an increase of 10 deaths per mille in the second following year. 


For constant death- 


Increase of 10 in Deathrate of SR é ce) 

Ist Year on that of 3rd Year 2nd Year Decrease ‘81 Decrease 1:28 
2nd Year on that of 4th Year 3rd Year Decrease ‘61 Decrease ‘52 
3rd Year on that of 5th Year 4th Year Decrease ‘99 Decrease 1°4 


It will be seen that these values are appreciable although far less important 
than the decreases produced in a following year by an increase in the immediately 


preceding year. 


Thus we judge that a selection of the weakly children in one 


year is largely influential on the deathrate of the immediately following year, and 
diminishes, as we might anticipate, with increase of time. 


Some objection might, however, be taken to the sixth difference correlations, 


when we consider deathrates of the same group two years apart. 


Ui dgmy . dgmg 
i dg my . dgmg 


V5qmg. Sgms 


Male. 


+227 +°159 
+ 339 +149 
+°397 +:142 


They are 
Female. 

+°200+°161 

+377 +:°144 

+°393 +142 


It will be seen that while they are all of the same sign and fairly accordant for 
both sexes the probable errors are becoming very substantial relative to the 
coefficients. We have indeed too limited a range of years. 


(7) If now we take out the correlation coefficients of the sixth differences for 
three years’ interval, and again for four years’ interval we find great irregularities. 


Male. Female. 
Togmy . dgmg +°205 +:'161 +035 +168 
P3gme . Og™Ms a ‘030 + ‘168 ae ‘072 + ‘] 67 
"gm « gms — 181 +163 — +251 +158 


The correlations now do not agree in sign, they are insignificant having regard 
to their probable errors, and there is no close correspondence for the two sexes. 
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We should need a far longer period than 50 years to determine certainly even the 
signs of these correlations, and their real magnitudes would require still ampler 
data. It would appear impossible to assert on the basis of the above values of 
the correlations at three and four years’ intervals more than the insignificance 
of the associations between deathrates of the same groups at intervals of more 
than two years*. In other words the effect of intense selection appears to be 
exhausted after an interval of two years. The word “appears” is used purposely 
because there must be some spurious weakening of the effect due to our not being 
able to follow absolutely the same individuals. 


(8) We have further studied to some extent the relationship between the 
male and female deathrates. There is almost perfect correlation between male 
and female deathrates in any given year of life after we annul the time-factor. 
Thus, if we represent female deathrates by m’, we have as illustrations: 


= +9905, 
Peqms. Some’ — ‘9880, 
T5gmg .5gmg" + ‘9687, 
TS mg. dgmy’ >= ak ‘9800. 


Psgm, . dgmy’ 


Of course the sole significance of these values lies in the fact that years of 
stress, whether due to climatic or epidemic causes, affect equally infants or 
children of both sexes of the same age. But these very high values in our 
opinion cast considerable doubt on the partial correlations derived from them. 
We have in fact 
— T1313 _N 

= De 
and if we suppose 7, and 7; nearly equal, then if 7; be of the above high value 
N will be extremely small, but D is also, owing to the presence of the factor 


V1—732, very small. Thus 37. although it may be very considerable is the ratio 


3°12 = 


* Actually the partial correlations of the sixth differences at three years’ interval based on the above 
values are : 


dgm, and dgnr4 
dgmy and dgms 


dgntg and dgmz +526 +°181 
dgms, and dgmg | +°251 + °485 


Correlation of For constant 3 @ 
| 


These are certainly all positive, but they are irregular as between the sexes and probably quite 
unreliable for the reasons already given. Should a more extended experience show that there is a 
real if slight positive correlation between deathrates at three years’ interval, while there is con- 
siderable negative correlation at one and two years’ intervals, we should be compelled to discuss 
whether there may not be something periodic in the nature of the heavy and light deathrates of 
infancy and childhood. We have been unable to trace any sign of such periodicity either in the 
deathrates or in the graphs drawn, but we do not believe that a very short periodicity would be elimi- 
nated by the variate difference method using any moderate uumber of differences. We cannot on 
this point accept Dr Anderson’s view. See Biometrika, Vol. x. p. 279. 
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of two small quantities and any disturbing cause which but slightly modifies the 
value of either 7, or 7.; may even change the sign of NV and so swing sry. over from 
a considerable positive to a considerable negative value*. 

We can consider the correlations between the female deathrate in one-year and 
the male deathrate in a second year, supposing of course time influence annulled. 
We have 


Tscmcoglia, aa ae ‘667 4 (Gaon Ogio sa 6879), 
Tégmy’.dgm. — — 7337 Cae, Sgm! — 7188), 
"Sgms dma’ — 7313 (Ts5ms 50g 745 ee 7032), 
5 gms’. dg my — — 7278 (1"5, ms’. dgmg’ — "7313). 


Thus we see that the same remarkably high negative correlations exist between 
the male and female deathrates of successive years of groups born in the same year 
as exist between male and male or female and female deathrates within the same 
group in successive years. In fact in two out of the four correlations the cross 
relationships are higher than the direct, although the differences are scarcely 
significant. Here again there is nothing noteworthy, considering the very high 
correlations just noted to exist between the male and female deathrates of groups 
born in the same year. We can, however, endeavour to correct such values by 
finding the relationship between the deathrate in females in the first year of life 
and males born in the same year in their second year of life for a constant death- 
rate of males in the first year of life. Or still more stringently between the 
deathrates of females in the first year of life with males in the second year of life 
for constant male deathrate in the first year of life and constant female deathrate 
in the second year of life. We should anticipate that such values would come 
out small or insignificant, if our interpretation of the high negative correlations 
between deathrates of the same group in successive years of life be a correct one, 
Le. that the high deathrate leaves a stronger population. For a heavy deathrate 
in the females of one year should not leave a stronger population of males for the 
following year after correction by partial correlation. 


We obtained the following correlations : 
5g my! Sgmy’ . Og mg tas 5240 + 0692, 
=+°4665 + 0746. 


dem! 65m . 56 My! 


* The reader must note that we say a ‘‘disturbing cause”; it is not the mere result of random 

sampling affecting N. The probable error of N=1rj2—133793 for a sample of size n is given by j 
4 e 

Jn 
and is thus quite easy to calculate. We have tested it on a number of cases of partial correlations 
worked out for this paper and find that if -67449cy is of the same order as N, then -67449c,,,, is 
of much the same order as 379. In other words, if N is so small relative to its probable error that 
it might easily have a reversed sign, then 37). is insignificant as compared to its probable error also. 
For example, N=:0446 and D=-0956 leads to 3rj2=°4665 with a probable error of -0746. 3ri2 is 
accordingly considerable and significant, but the probable error of N is only :0105, and we can hardly 
suppose the sign of 3r1 due to a random sampling variation in the sign of N. 


“674490 y= 67449 — {D2 — N2[2 (1 — 1452) + 2 (1 — 1992) +1 — ry? - 3]}?, 
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These values were so startling and so contradictory, that we proceeded to 
eighth differences with the results: 


gm" dg my’. dgmq — ‘60185 + 0609 
gmz/""Sgmy . Sgmy’ — + 5481 + 0667, 
which emphasised as well as confirmed the previous results. 


Now it seems absurd to suppose that the deaths of female infants in one year 
can organically influence the deaths of males of the same group in the next year, 
or male infants the deaths of females in the successive year. But the extraordinary 
feature of these results is that while a high deathrate of female infants lessens the 
deathrate of males in the second year of life of the same group, a high deathrate 
of male infants increases the deathrate of females in the second year of life of the 
same group. 

In order to throw further light on the matter we investigated male and female 
deathrate correlations in the third and fourth years of life. We found 


5gM3! Sgmg’. Sgmy — — ‘2640 + 0887, 
— 0082 + ‘0954, 


The second is practically zero, the first of no importance having regard to the 
high values of the correlation of deathrates of groups of the same sex in the third 
and fourth years of life (/:—°703 +085; 2:—°731 4 :078). Had we come to 
these values at first we should have been content, but the cross relation between 
the infant deaths of one sex and the deaths in the second year of life of the 
opposite sex was undoubtedly puzzling. 


5g mg! gms. Sm! — 


We then proceeded to still further limit our conditions by determining the 
partial correlation between female infants in one year and males in the second 
year of life of the same birth-year when the deathrates of the males‘in the first 
year of life and of the females in the second were both constant. We obtained 
= +1632 + 0928, 
= +2997 + 0868. 


Og my - OG my! Sgmy! . Og my 

5g my’. dg my! Ggmy . 5g ms! 

Having regard to their probable errors these are of a quite different and 
negligible significance when compared with the values of 


an d 56 my 


gm!) dg my’. dG my 'TSgmy. domo’ 


given above. 
It is worth while noting that 
Sg! 7 Sg my’ . 8gm, = — °2188 + 0908, 
Sgma! demi «Sama = + 1088 + 0943 
also give values of no practical importance. Or, to annul the spurious influence 
of infantile deaths of one sex, A, on deaths in the second year of sex, B, of the 


same group, it is more effective to render constant the deaths of A in the second 
year of life than of B in the first year of life. 
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In the light of this result we have found the correlations between deathrates 
of sex A in the third and sex B in the fourth year of life, for constant deathrate 
of sex A in the fourth year of life. 


We have 
= — 0818 + 0948, 


Sgmy! Syms’. Sg my” 
Some idemsstsen = ‘1477 + (0933. 

Both of these may be taken as zero, having regard to their probable errors. 

Thus on the whole, while the relation between the deathrate of a group of one 
sex in one year and the deathrate of the remainder in the following year of life 
appears after the annulment of the time-factor to be very considerable and 
negative, there does not appear to be any organic relation between the deathrate 
of sex A in one year and sex B in the following year, if we proceed by the method 
of partial correlation. But at the same time we believe that this method must 
be used with very considerable caution, and that to avoid erroneous conclusions the 
whole problem must be investigated from a variety of standpoints in cases like the 
present where one of the three total correlations is extremely high. The numerator 
NV ranges in the cases we have been discussing from about ‘01 to ‘05 and with a 
small total frequency like 50, any disturbing cause—apart from random variation— 
may have marked influence*. 

(9) The conclusion which we have formed is that in the present problem of 
natural selection it is probably better to annul the environmental factor by 
the variate difference method rather than to proceed by the method of partial 
correlation as we have hitherto done. 


By the former method we have shown that for both sexes a heavy deathrate 
in one year of life means a markedly lower deathrate in the same group in the 
following year of life, and that this extends in a lessened degree to the year 
following that, but is not by the present method easy to trace further. It is 
difficult to believe that this important fact can be due to any other source than 
the influence of natural selection, i.e. a heavy mortality leaves behind it a stronger 
population. Nature is not concerned with the moral or the immoral, which are 
standards of human conduct, and the duty of the naturalist is to point out what 
goes on in Nature. There can now scarcely be a doubt that even in highly 
organised human communities the deathrate is selective, and physical fitness is 
the criterion for survival. To assert the existence of this selection and measure 
its intensity must be distinguished from advocacy of a high infant mortality as 
a factor of racial efficiency. This reminder is the more needful as there are not 
wanting those who assert that demonstrating the existence of natural selection in 
man is identical with decrying all efforts to reduce the infantile deathrate. 

We have to acknowledge the great assistance we have received from our 
colleague Miss Beatrice M. Cave in the laborious arithmetical work of this paper. 


* If F=N/D, where N and D are both small, but F finite, then 6f/F=6N/N-6D/D and small 
disturbances produce great results in I’. 


FREQUENCY DISTRIBUTION OF THE VALUES OF THE 
CORRELATION COEFFICIENT IN SAMPLES FROM 
AN INDEFINITELY LARGE POPULATION. 


By R, A. FISHER. 


1. My attention was drawn to the problem of the frequency distribution of the 
correlation coefficient by an article published by Mr H. E. Soper* in 1913. Seeing 
that the problem might be attacked by means of geometrical ideas, which I had 
previously found helpful in the consideration of samples, I have examined the two 
articles by “Student,” upon which Mr Soper’s more elaborate work was based, 
with a view to checking and verifying the conclusions there attained. 

“Student,” if I do not mistake his intention, desiring primarily to obtain 
a just estimate of the accuracy to be ascribed to the mean of a small sample, 
found it necessary to allow for the fact that the mean square error of such a 
sample is not generally equal to the standard deviation of the normal population 
from which it is drawn. He was led, in fact, to study the frequency distribution 
of the mean square error. He calculated algebraically the first four moments of 
this frequency curve, both about the zero point, and about its mean, observed 
a simple law to connect the successive moments, and discovered a frequency curve, 
which fitted his moments, and gave the required law. 


Thus if a, #, ... 2, are the members of a sample, 
NL = 0, + %,+...+ Ln, 
and ny? = (@, — @)? + (&% —Z)P +... + (Lp — @)’, 
the frequency with which the mean square error lies in the range du is propor- 


tional to 
: ms” 


jee 20? du. 


This result, although arrived at by empirical methods, was established almost 
beyond reasonable doubt in the first of “Student’s” papers. It is, however, of 
interest to notice that the form establishes itself instantly, when the distribution 
of the sample is viewed geometrically. 

* Biometrika, Vol. 1x. p. 91. + Ibid. Vol. vt. pp. 1 and 302. 
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In the second of these two papers the more difficult problem of the frequency 
distribution of the correlation coefficient is attempted. For samples of 2 the 
frequency distribution between the only two possible values —1 and +1 was 
7 
2} 
where p is the correlation of the population. Besides this theoretical result, | 
“Student” appeals only to experimental data. From these he derives an 
empirical form for the distribution when p=0, and makes several valuable 
suggestions. It has been the greatest pleasure and interest to myself to observe 
with what accuracy “Student’s” insight has led him to the right conclusions. 
The form when p=0 is absolutely correct, and as a further instance I may quote 
the remark* “TI have dealt with the cases of samples of 2 at some length, because 
it is possible that this limiting value of the distribution, with its mean of 


determined by Sheppard’s theorem to be in the ratio +sin'p : 5 — sin“, 


AS : ‘ Dine 2 : 
~ Sin~'p and its second moment coefficient of 1 — e sinp) , may furnish a clue 


to the distribution when n is greater than 2.” As a matter of fact it is just these 
quantities with which we shall be concerned. 


To Mr Soper’s laborious and intricate paper I cannot hope to do justice. 
I have been able to establish the substantial accuracy and value of his approxima- 
tions. It is one of the advantages of approaching a problem from opposite 
standpoints that Mr Soper’s forms are most accurate for those larger values of n, 
where the exact formulae become most complicated. 


2. The problem of the frequency distribution of the correlation coefficient 7, 
derived from a sample of n pairs, taken at random from an infinite population, 
may be solved, when that population can be represented by a normal surface, 
with the aid of certain very general conceptions derived from the geometry of 
n dimensional space. In this paper the general form will first be demonstrated, 
and for a few important cases some of the successive moments will be derived. 
Incidentally it will be of interest to compare the exact form with Mr Soper’s 
approximation, and with reference to the experimental data supplied by “Student.” 


If the frequency distribution of the popuiation be specified by the form 


ot 5 my)” _ 2p (a —m,) (y — ms) (y - ot 
df = 1 =e 1—p?2( 20,2 20102 2o>2 dady 
Qra,0,V1 — joe : 
where df is the chance that any observation should fall into the range dwdy, then 
the chance that » pairs should fall within their specified elements is 


if e =m)? _ 2p (w— my) (y — mo) a (y — me)? 


1 a res p 2 
(Qqro, om V1— pin 5 as cate gee He | dx, dy, ... dn AYn...(1), 


and this we interpret as a simple density distribution in 2n dimensions. 


* Biometrika, Vol, vi. p. 304. 
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For the variables # and y it is now necessary to substitute the statistical 
derivatives determined by the equations 


Ne = 


= Mea 


(2), ng = &(y), 
nus = (eB), mpd = xy ~ 9, 


nN 
MT Pa fy = = (a— %)(y—Y), 
and it is evident that the only difficulty lies in the expression of an element of 
volume in 2n dimensional space in terms of these derivatives, 


The five quantities above defined have, in fact, an exceedingly beautiful 
interpretation in generalised space, which we may now examine. 


3. Considering first the space of n dimensions in which the variations of « 
are represented, the mean and mean square error of n observations are determined 
by the relations of P, the point representing the n observations, to the line 


Ly = = Hy =... = Ly, 
for the perpendicular PM drawn from P upon this line will lie in the region 
B+ Het... tUy = NX, 
and will meet it at the point M, where 
N= eee Oe eC 5 
further, since, PM? = (a, —%) + (a —@P +... + (@n— 2), 
the length of PM is p/n. 


An element of volume in this n dimensional space may now without difficulty 
be specified in terms of % and y,; for, given % and w,, P must lie on a sphere in 
n—1 dimensions, lying at right angles to the line OM, and the element of 
volume is 

Cu dp,da,-’ 


where C is some constant, which need not be determined. 
65—2 
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The point in 2n dimensional space which is represented by the n pairs of 
observations must be such that its projection on the n dimensional space, in 
which « is represented, lies upon a certain sphere of radius ,/n, and on the space 
in which y is represented, upon another sphere of radius p/n, and now, when we 
come to the interpretation of r, we must observe that to each point on the first 
sphere there corresponds a certain point on the second sphere, to which it bears 
the relation 

aa yaa Gna 
WY yea Yo wee 

In general this relation does not hold for the n pairs of observations, and the 
two projections will not fall at corresponding points on the two spheres. If now 
one of the spheres be turned round so as to occupy the same space as the other, 
and so that the lines upon which a, and y,, and the other pairs of coordinates, are 
measured, coincide, then corresponding points will lie on the same radii, and the 
correlation coefficient 7 measures the cosine of the angle between the radii to the 
two points specified by the observations. 


Taking one of the projections as fixed at any point on the sphere of radius po, 
the region for which r lies in the range dr, is a zone, on the other sphere in n — 1 
dimensions, of radius “,VnV1—7, and of width gw, Vndr/V1—7r?, and therefore 

n-4 
having a volume proportional to pw,” (1—7?) 2) dr. 


4, We may now turn to the direct simplification of the expression (I), at each 
stage discarding any factors which do not involve r. 
eee oe _ 2p (ema) y =me) won 
eas 20102 202" S da, dy,da,dyy ... diary dyn 


may be reduced to 


n he =m)? + oy? — 2p {rpyme +(G~ my) (J — ms)} 4 (¥- m2)? aot 


» 1—p?? 2072 20102 2o02 ee 
AD AY py”? A pry fig"? A fly (1 — 7) 
__n a _ 2pruame + i n-4 
or to Ta pi (203° 20,93 203? fey eae ae (leo?) 2 du, du,dr. 


In order to integrate this expression from 0 to 0 , with respect to mw, and pe, let 


Habe oe Hao 


O10," fed,” 
and we have 
0 09 = =a (cosh z— pr) ¢ a 
OZ Wi iG One (1 -7’) 
axes) 0 
d n—-4 
- is 
z 
or Se (a) 2 


0 (cosh z— pr)” * 
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which, on substituting cos @ for — pr, may be expressed in terms of a Legendre 
function in the form 


n—4 
(i cosec 0)" Qn_o(t.cotO).(L—9®) 2 dr ceececccecesesseeceee ily, 
Nak I a dz teed. 
EES 9 cosh z+cos @~ sin @’ 
- dz 1 Git Eee Oe) 
Eo bee I, (cosh z + cos 0)" D —2 (aa 78) sin 0’ 


and since this is a function of pr only, we may express the frequency distribution 
by the convenient expression 


Fart / 6 
Ne ig Ra 
aD or" fe a) ue 
Professor Pearson has shown that this last result can be obtained directly 
from Sheppard’s theorem* that 


ao 1 Ma 2 be 

1 [ [e ~ 3 ) (5 aes a) 
Id, 2.V1— R? 0 Jo 

making the substitutions 


pol po 


pnd pa = ose) 


1 ” n 
(=Rh)z2° d= p°)ay” 
it s n 
(— B)S)— poe’ 
R mrp 
(—R)>,5,  (1—p2)a,0,’ 
which give R= pr 
and cos (— Rh) = 6, 


we obtain 


n My? 2prey fy | Me” 

2 po — oe 2 2 
i ia) sana 7102 F62 
- ~ e 2 

o1,02(1—p)Jo Jo 


and hence differentiating (nm — 2) times with respect to 7, the required expression 
is obtained. 


6 
OU se 


5. The form which we have now obtained may be applied without difficulty 
to all small even values of n, and in such cases is peculiarly suitable for the 
calculation of moments. 


When n= 2 the ordinate of the curve, with abscissa 7, is 


SOE 
(1— 71°) sin 8’ 


which becomes hyperbolic in the neighbourhoods of —1 and +1. The value 
* Phil. Trans, Vol. 192, A, p. 141. 
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of r is, therefore, as we know, either —1 or +1, and the proportion, in which 
these occur, depends upon p. The ratio of the infinite areas included with the 


asymptotes of the above curve is 


cos" p 
eos (= p)’ 
so that the mean value of a number of observations is SU 
2 
When n=4 there is still no approach to normality, the curve takes the form 
— (0 — 3 cot 6 + 36 cot? @), 


which, when r is positive, increases regularly from its value of ;4 when 6=0, to 
infinity, to which it approaches as @ approaches 7. Unless p is actually equal 
to 1, in which case r is also 1 of necessity, the curve has finite ordinates at both 
extremes. For calculating the number of values which should fall within any 


given range, the integral, earl — @cot@), may be directly tabulated, as has 


been done in forming the accompanying table of “ Student’s” observations, and 
the corresponding expectations. The values given by Mr Soper’s formula are 
apposed for comparison. 


Table for comparison with p. 114, Biometrika, Vol. IX. 


H.E.Sopevr’s 


Calculated ae 3 i 
r frequency | Observed ri Sees oe approxi- Diasec g 
m : m mation g m 
‘905—1 | 202-1 175-5 230°3 ge 
*805—905| 1249 | 136-5 } eRe 69 | “98-9 } pie 20 
705—805| 88-7 84 } 72°1 | 
— 38 09 +203 | 318 
-605— 65:1 66 57°6 
Peoer =e a t +123. { 173 |. Zee \ +118 | 158 
305— 30°6 245 34°3 x 
205— 24°8 24:5 } = Oe Te | 99-7 i Boe 2 
105— 20°5 19 25°6 2 
aoe aa 7 \ =11-6 9). 2:55" sllemsets } 21:6 | 9-80 
1-905— 145 22 18°8 2 
1-705 — 10:7 ee oo NE at ee * 
ef a 
‘o re 5 . . i . . 
oe he iG } 412-7 | 10-54 oe $121 | 9°21 
Pape oe ca } +51] 219 He + 86 | 8-80 
1105— 5-1 ES NS : 19 . ; 
ie ee ae ot ee lets 2 } +105 | 44-10 


— 745 — 23°61 — = 84°17 
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6. The direct process of integration by parts applied to such expressions as 


-4 
ae 1 (eB > : on 02 ' 
| ae ey, and es (1 —r*) T ani 9 Os 
: : is: oP & 
when n is even, merely introduces the sums and differences of the terms AP 


at the extremes, where r is —1 or +1, with coefficients which are, in any 
particular case, easily calculable. 


Thus, » being 6, 


OLE bat Cd AE CGE a. Oe yn 
ie =) ga= ja 5 |. a - (25 oI, 
= 2 x the sum of the extreme values of 6 7p 9 — 3 cot 0 + 30 cot? A) 


(1 — @cot 8). 


— 2x the difference of the extreme values of 3 r 


If p=sin a, so that the extreme values of 6 are a= a and 5% a, the sums and 


differences may readily be expressed in terms of a, and the first few may here be 
tabulated: the table has been carried back as far as is necessary for the calculation 
of the fourth moment. 


sum difference 
sin?6 (7+26? 7-66? m cot? 6 
po Bee OROs= 2 = ————— 3 te an? 
ap? a 36 cot 6 7 cot of m (a+3 tan a+3 a tan?a) 
; 62 2 
= {6 +#(1 -5) cot a cota (1+a tan a) cota {a-2ton a+ (F +0’) tan a} 
6? 1 : 
3 Zi +a? Ta 
snd 5 mw tana 2a tan a 
= 7 ae 6 cot 6) 2 tan? a (1+a tan a) m tan? a 
— Gs 3 cot 6 +36 cot? 6) a tan? a (143 tan? a) 2 tan? a(a+3 tan a+3a tan? a) 
a (4-96 cot 6415 cot? 6-156 cot? @) | 2tanta(4+9atana+15tan2a+ 15a tana) | r tanta (9 tan a+15 tan’ a) 


There are here two natural series, which appear alternately as sums and 
differences; the simpler, which may be expressed in the form 


7 sin? a (. ae 
2 cos ada ‘ 
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is essentially a series of Legendre functions of the first kind; and may be 
expressed as 
Be 


i 
= . tan? a ea Py (1 tan a) ; 


and it is these only which occur in the evaluation of the even moments. 


7. It is, however, desirable to obtain general expressions for these integrals 
in terms of 1 and p, and to evaluate them when n is odd. 


For this purpose let us introduce a quantity ¢, such that 
cos ¢ = cos 0 —k, 


then, when & is sufficiently small, we may expand ¢? by Taylor’s theorem, so that 


Pa Cee | aC eat a: oP NOE 
gat * 59903 t/3 (sna) 3 to 
Now let k= phv1—7, 
eesti re Cn =") ( 0 y o 
hen Bene Se anim s 2 sin 000) 21°” 


and differentiating twice with respect to h 


F ; 4) 2 ¢ pee 2 a) 2 @2 i 3 6) 3 @2 
pa Teer) (e ae =e -)( sag) 5 thea) Gam! ge 
whence, dividing by (1 — r)2, we obtain 
p Gein) SOR arenes ( Was (sana) 5 
as & a) 2 (1 — 72) \sin 7) Dal Ue sin na 2 
pale 4 ( oe 
Ee wate (sn 000) 3 
a n—4 4 Q 
: “3 3 gr e 
so that {Ee r? (1-7?) aaa g ar 


may be obtained by multiplying by |x —3 the coefficient of h”-* in 


ef r? dr il —¢ cot d 
=1 A) Pay sind’ 


when cos ¢ = cos 0 — phV1 —r?=—p(r+hv1—7°). 


Our object might equally be achieved by the evaluation of the integral 


ie ner ( (oy ) 
fe -1 1—7°\sing sin @/" 


The quantity ¢ is determined by the equation 


cos ¢ = cos 0 — ph V1 — 7°, 
that is cosh =—p(r+hv1—7"). 
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If now r=sin B, 
h=tane, 
then cos = — psin Bf, 


cos $=—pV1+h?sin(@+e)=—pV1+/?sin f’, 


and as_ sr passes from —1 to +1, 


8 passes from -5 tomer 


2 y 
6 from 5 —a to xt a, 
Bp from —v 46 to = and thence to = Das 
2 2 2 
7 7 
and co) from g—%tos +a and thence back to — st a, 


where sina =p V1 +42, ¢ oscillates in the same manner as 6, with a somewhat 
greater amplitude, and slightly in advance in respect of phase. 


1 dr 
P si? VI- 9 


The expression 
may now be reduced to 


pe ae : pa ’ 
e| - aag (. 1 & g sin a sue .) ae’ 
= +e 


sin? d zi —sin?a’ sin? 8’ (1 —sin?a’ sin? B’)? 


us 
2 


ane] ats sina’ sin 8’ dp’ 


= (0 ie 2 28 : ; $ 
zs » 1 —sin?a’ sin’ — sin’ a’ sin (1 — sin? a’ sin? 8’)? 


+p fi (f) sin a’ sin PB’ dp’ 


a an — sin? a’ sin? 6’)? 


2 2 a7 4 7 2 
_puw | 7p ae (=) + ™p (1 — cosa’) 


cos a” cos?a’ \cosa cos? a’ 
pr sin a tan é€ 
~ cos? a’ cos a 
but cos? a’ = 1 — p?(1 + h?) = cos?a— sin? a tan’e, 
,ftil-¢cotd dr om tan? a 
so that jes ——_ as 
1 sitd vVl—,~ 1l-Atane 


From this evaluation we deduce the general form 


ie (1 ae Foe e Cie, CAN YO cy dae anlaw sie (III). 
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The absolute frequency df, with which + falls in the range dr, is therefore - 


n-1 net 


—p?) 2 ) n—2 @ 
SES Co) ae ie aie 
8. Ido not see how to integrate the other expressions of the type 
en rv dr 
es sin?d VJ — 7?’ 


although a form could probably be obtained when p is even. The general 
expression for the second moment may, however, be deduced by means of a 
reduction formula. 


By a process of integration by parts it appears that, if we write 


n-4 ne 23 Q 
n—1 2 
ie (ies 2)? ares Nl 2 dr =Ln.p, 
then Lnse.2 = Into.0+ WLn,9 —u(n— Dies 
: tan® 
and since i — an (= “—tana+a), 


we may obtain successively 


tan?® tan? 
I 6.9 = 247 ee eesee “+ tana—a), 
4 3 
an7 7? 3 
io 20e (= green CPrges “—tana-+a), 
6 5 3 


and so on, yielding, when n is even, the expression 
a 
In.g = In. —T |n _ 2{ tan" xd, 
era oe) 


a form which may well hold when 1 is odd. 


The above expressions are useful in tabulating the numerical values of the 
second moment, 7+ 07, of the unit curve, which may easily be calculated in 
succession for different values of n when tan?a is taken to have some simple 


value. 


9. Before leaving this aspect of the subject it is worth while to give a more 
detailed examination of the mean of the frequency curves of r when n= 4. 


Two formulae are arrived at by Mr Soper, which are equivalent approximations 
of the second degree 


a fat 3 PA elie 1—p? 3 
L r= p|[1- ay {1+ 7 +8p)} [=p [1-95 {r+ pga +3e9} |, 


1. F=p [1 HOF {tga M9 J=0[1-*Gf1-70-o} 
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and these we shall compare with the form 


ET: 7= 2 (a + cot a— acot?a), 
p | 1000 | -2000 | -3000 | -4000 | 5000 | 6000 | 7000 | 8000 | 9000 | -9500 
I | 0853 | 1710 | -2578 | 3463 | 4377 | ‘5333 | ‘6347 | ‘7443 | 8649 | ‘9304 


II | 0847 | 1697 | 2555 | 3419 | 4310 5241 | 6236 | °7330 | 8566 | -9254 
TIT | 0850 | 1704 | 2570 | 3451 | 4360 | 5301 | 6290 ‘7357 | 8540 | 9209 

It will be observed that the approximations le on either side of the exact 

value over the greater part of the range, and that the error of the first 


approximation increases up to the value when p="9. The second formula 
gives the correct value somewhere between ‘8 and ‘9, and is thereafter too 


large. 
For the particular case p = 6608, 
I find (formula IIT) 7 ='5897, nearly the maximum difference from p, 
Mr Soper gives (p. 109) the value 5933 
and the experimental data ‘5609. 


The two theoretical values are much nearer to each other than either is to 
the experimental value. On the whole, it is obvious that even in this unfavour- 
able case Mr Soper’s formulae possess remarkable accuracy. 


10. The use of the correlation coefficient r as independent variable of these 
frequency curves is in some respects highly unsatisfactory. For high values of r 
the curve becomes extremely distorted and cramped, and although this very 
cramping forces the mean 7 to approach p, the difference compared with 1 —p 
becomes inordinately great. Even for high values of n, the distortion in this 
region becomes extreme, and since at the same time the curve rapidly changes 
its shape, the values of the mean and standard deviation cease to have any very 
useful meaning. It would appear essential in order to draw just conclusions from 
an observed high value of the correlation coefficient, say 99, that the frequency 
curves should be reasonably constant in form. 

The previous paragraphs suggest that more natural variables for the treatment 
of our formulae are afforded by the transformations 

ip 


PEA oo irceaer( 
l-r 


p 
7 = tan a= —___., 
“/ 


The expression for the frequency curve (II) 
aaa 6 
; moe n—-1 ff2 : 
G8)” laa) ae 
66—2 
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now becomes 


( 0 ie 62 dt 


" n-1 
sin 000 2 (+e) = 


and the range of the curve is extended from — to +o. 


It is interesting that in the important case, r=0, the frequency reduces to 


dt 
~—~n=1 and the curves are identical with those found by “Student” for z, 


d+?) 
the probability integral of which he has tabulated in his first paper. 


11. The moments of these curves are obtained by the evaluation of the 


expressions 
i ( 0 ie @ dt fe ( 0 Va @ tt 
in@ ; a » \sin 000 5 haneeeaie 
—« \sin@0@ 2 dQ+e)2 J —» \sin@00 2 +e2 
and so on; of these the first is known already (III) to have the value 
vg In —3 
preareaenii=l 
Chea yo)) 2 


and the others may be obtained in succession, for 
Greys i << Cre Cedi, oh Cm | Sut Ney G2 aeaateone 
CP eee SiO OO) cara (14 pyr op” 
aoe | ee eS 
TO ptt as i tae ano p annem 
so that the first moment 
i ( ri) Nae tdt fy) 7 |r — 4 ain 


. a 5 n=l ~ 9, ° n= n=4 
sin 806 (1+ #)? 2 (Ch) (ae) 


»n—1 n—1 
eae 2 (1+#)2 


4(n—2)p_ 


—@ 


geal gar ail ss 
n-3V/1—— n-3 


hence t= T. 


The mean, therefore, is greater than the true value 7 by a constant fraction 
of its value. And this fraction decreases in the simplest possible manner as n 
increases. 

In the same way, we may evaluate the second moment, 


COs = array Hea) ag) 
preatie| IH ise 3) all 
and a= flts + Ooo ot 
the third moment 
ae (n—2)r Re au i )))) 
VB.e ~ (n—3)(n—4)(n—5) |sa+e)+ (n—3p J’ 


and the fourth moment 
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3 6 (n — 2) 7? ah 

a Se eS 2\2 
ae = (n—4)(n—6) {a a) GA — 3)(n — 5) (n— 3)!(n—5) 

For high values of n, all but the first terms tend to vanish; §, tends to vary 
as p’?, and @, tends to become independent of p. In effect for high values of 7, 
where p? is nearly equal to unity, the form of the curve is nearly constant, but the 
skewness measured by , decreases to zero at the origin, and changes its sense, 
when 7 and p change their sign. 


(l+7°)+ 


Tables are appended for inspection rather than for reference which show the 
nature and extent of these changes in the form of the curves. 


Table of o°. 


P= ‘O01 03 ‘10 *30 1:00 3°00 10°00 30°00 100°00 


8 °2531 2593 *2810 3430 “5600 1°140 3°350 9°550 | 31°250 
13 "1123 °1148 1234 1481 2344 ‘4811 | 1344 3811 | 12°444 


18 ‘07219 | °07372 | -07908 | :09438 | °1479 *3010 8365 2°367 7722 
23 705319 | °05429 | -05817 | °06925 | -1080 2188 6066 1714 5°592 
3S 03484 03555 | -03805 | 04518 | "7015 1415 3912 17105 3°602 
43 02590 | -02643 | -02827 | °03353  -05194 1045 “2886 *8146 | 2°655 
53 02062 02103 | °02249  -02666 | -04123 08288 | °2287 *6451 | 2°103 
Table of By. 
eS 01 O03 ‘10 30 1:00 3°00 10°00 | 30°00 | 100-00 oe 
r= 
& | 05685 | °1662 ‘5076 =| 1°230 2°450 3°788 3°965 | 4°153 | 4°184 | 4°252 
13 | 01517 | 04776 1376 *3400 *7058 | 1:018 1°205 | 1271 | 1296 | 1°3065 
18 | 008399 | :02463 ‘07645 | °*1914 “4016 5857 *6990 | °7395| °7546| °7619 
23 | ‘005757 | ‘01691 05247 | °1317 “3016 4093 "4910 | °5208| °5314| °5361 
38 | °003518 | 010385 038214 | -08100) ‘1731 "2559 *3031 | °3260} 3334 | °3366 
43 | °002530 | -007435 | *02315 05841 | +1251 "1858 ‘2237 | °2376| °2429 | °2452 
53 | :001973 | 005798 | *01807 04562 | -09800] +1458 1757 | *1868} +1910] °1928 
Table of Bs. 
7=| -00 ‘01 03 “10 30 1:00 3°00 10°00 30°00 | 100°00 or) 
r= 
8 | 60000 | 6°1137 | 6°3179 | 70179 | 8:4767 | 10°9668 | 12-9652 | 14°1116 | 14°5024 | 14°6508 | 14°7159 
13 | 3°8571 | 3°8802 | 3-9248 | 4:0663 | 4°3770 | 4:9397 | 5°4240| 5°7147| 5°8186| 5°8578| 5:8750 
18 | 3°5000 | 3°5121 | 3°5356 | 3°6104 | 3°7937 | 4°0828| 4°3532) 4:5186| 4°5783 | 4°6009| 4-6109 
23 | 3°3529 | 3°3612 | 3°3768 | 3:4271 | 3°5556 | 3:7486 | 3°9356) 4:0511 | 4:0930] 4:1089]| 4:1159 
33 | 3°2222 | 32271 | 3:2365 | 3°2667 | 3°3343.| 3:4619| 3°5773| 3°6493| 3°6756| 3°6856| 3-6899 
43 | 3°1622 | 31656 | 3°1723 | 3°1938 | 3°2422 | 3:3261 | 3°4172|] 3:4692| 3°4886| 3:4958] 3:°4991 
53 | 31277 | 3°1303 | 3°1356 | 3°1522 | 3°1898 | 3:2640] 3°3281| 3°3676| 3°3826] 3:3883| 3-3909 
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12. The fact that the mean value 7 of the observed correlation coefficient is 
numerically less than p might have been interpreted as meaning that given 
a single observed value r, the true value of the correlation coefficient of the 
population from which the sample is drawn is likely to be greater than r. This 
reasoning is altogether fallacious. The mean 7 is not an intrinsic feature of the 
frequency distribution. It depends upon the choice of the particular variable r 
in terms of which the frequency distribution is represented. When we use ¢ as 
variable, the situation is reversed. Whereas in using 7 we cramp all the high 
values of the correlation into the small space in the neighbourhood of r=1, 
producing a frequency curve which trails out in the negative direction and so 
tending to reduce the value of the mean, by using ¢, we spread out the region ot 
high values, producing asymmetry in the opposite sense, and obtain a value ¢ 
which is greater than tr. The mean might, in fact, be brought to any chosen 
point, by stretching and compressing different parts of the scale in the required 
manner. For the interpretation of a single observation the relation between 
¢ and 7 is in no way superior to that between 7 and p. The variable ¢ has been 
chosen primarily in order to give stability of form to the frequency curves in 
ditferent parts of the scale. It is in addition a variable to which the analysis 
naturally leads us, and which enables the mean and moments to be readily 
calculated, and so a comparison to be made with the standard Pearson curves, but 
it is not, with these advantages, in a unique position. In some respects the 


function, log tan 4 (a +5) , 1s its superior as independent variable. 


I have given elsewhere* a criterion, independent of scaling, suitable for 
obtaining the relation between an observed correlation of a sample and the most 
probable value of the correlation of the whole population. Since the chance of 
any observation falling in the range dr is proportional to 


4 
ee PW ie peat Be Oiage 
rey ea) (a oe my 


for variations of p, we must find that value of p for which this quantity is a 
maximum, and thereby obtain the equation 


n—-1 
4) x mow : 4) n—l 62 = 
dp ia moe fe 730) 3 a 


Since ie dat aes 1 ( 0 es @? 


o (coshaw+cos 6)” |n—1\sin@00 2 
a = d 
ye ee 2: 
ge ke I, op \a p*) (cosh & + cos no ae 


* R. A. Fisher, ‘On an absolute criterion for fitting frequency curves,” Messenger of Mathematics, 
February, 1912. | 
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which leads by a process of simplification to the equation 
du 
i (uabiz= are —p cosh 2) = 0. 


Since cosh w is always greater than pr, the factor in the numerator, r—p cosh a, 
must change sign in the range of integration. We therefore see that r is greater 
than p. Further an approximate solution may be obtained for large values of n. 
The integrand is negligible save when # is very small, and we may write 


9 


1+ 5 for cosh x 
nar 
and (1 — pr)” e*(1—P") for (cosh a — pr)”. 
meena? ene 
Then rf e Te | (1+ 5)e aC PO de, 
0 0 


and in consequence, as a first approximation, 
1 = r 
=p(l 2 ) : 
r=p ( ats oF 


The corresponding relation between ¢ and 7 is evidently 


1 
—is (1 ar x) Fi 
It is now apparent that the most likely value of the correlation will in general 


be less than that observed, but the difference will be only half of that suggested 
by the mean, @. 


It might plausibly be urged that in the choice of an independent variable we 
should aim at making the relation between the mean and the true value approach 
the above equation, or rather that to which the above is an approximation, or 
that we should aim at reducing the asymmetry of the curves, or at approximate 
constancy of the standard deviation. In these respects the function 


log tan } (a + 5) that is, tanh p 


is not a little attractive, but so far as I have examined it, it does not tend to 
simplify the analysis, and approaches relative constancy at the expense of the 
constancy proportionate to the variable, which the expressions in 7 exhibit*. 


* [It may be worth noting that Mr Fisher’s ¢ is the ¢-square root mean square contingency—of the 
more usual notation, and is the expression used in determining the probability that correlated material 
has been obtained by random sampling from uncorrelated material. Eb.] 


ON THE DISTRIBUTION OF THE STANDARD DEVIATIONS 
OF SMALL SAMPLES: APPENDIX I. TO PAPERS BY 
“STUDENT” AND ROA FISHER: 


(EDITORIAL.) 


CONSIDER the population distributed according to the law 


—m)2 
hy (x —m) 


and let a sample of n represented by the variate values a, x... &, be taken from 
it. Then the probability 6P that this sample will lie between 
a, and 2, +6a,, x, and a+ da, ... % and a+ bap, 


Nn _1 S(@,-— my 


yet Boat 


is = Ge Note Or;OTs 2Oce 


= const. xX e 


where %= - Sas) ae a— - S(a,;—Z)? we may write: 


n=? n (Z&—m)? 


a (3 
2 D} 2 
SP =const. x e g 


) OD OL g's ss OL gl eee nee (iii). 


Changing as Mr Fisher does (see p. 510 above) to & and & as coordinates 


we have: 
n=? n(xZ—m)? 


= (bese 
SP =const. xe ( os o ) >"-2 87 8>. 

We see at once from this* that the law of distribution of samples of means is 
the normal curve 


YH We. 2 OE wee ee eee (iv) 


* Of course the form reached above shows that for normal distributions there is no correlation 
between deviations in the mean and in the standard deviation of samples, a familiar fact. 
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with mean =m, the mean of the population, and with standard deviation 
=a//n, a well-known result. 


On the other hand the distribution of samples of standard deviations is 


This curve was first reached by “Student” as a highly probable result 
following from the relations he had obtained from the moments of  ?*. 
Mr Fisher's work thus enables us to justify “Student’s” assumption. 
“Student” has discussed at some length the distribution curve for >. He 
has obtained the values of the moment coefficients p., ws; and py, and the 
general expressions for the means when n is even and odd. The whole problem 
is of such importance that it seems worth reconsidering, and providing tables 
showing the approach of the distribution curve to normality as n rises from 
4 to 100. 


The following investigation largely repeats work given by “Student,” but it 
expresses the values for ws, w,, and @, and 8, in a different formt. We shall not 
use approximate expressions for the constants, for the order of terms in 1/n 
depends so largely on the relative magnitude of their coefficients, that such 
expressions become unreliable for values of n under 100. 


Clearly (v) is a skew curve with range limited at one end, }=0, and not at 
the other, = 00. See Figure p. 524. 

We shall write the standard deviation of %, os, and the moments of the 
frequency about the end of the range O as M,’, M,’, etc., while the moment- 
coefficients about Q will be as usual w,(=0), 2, etc. Obviously w.=cs% It is 
desirable to ascertain S, =, os and the skewness as well as §, and ®, for the 
distribution. We do this to show the rapidity of change to a normal distribution. 
It is well, however, to notice a priori that for n large the distribution does become 
normal. 


* «Student’s” approximate values for 6, and fy (loc. cit. p. 10) are, we fear, erroneous. He gives 


iManeee aly but it is needful to have a further term in we 


a) ene 72 2 order to obtain 6; and fy, correctly to 


the second approximation in =. If this further term be p/n?, then: 


1 64p —3 : A sere 3 
oi on (a + i). as against ‘‘Student’s On (a - i) ; 
16p 1 
Bo oF lgerers ” ” ” 3 (1- 7a). 


An examination of our table (p. 529) shows that ‘‘Student’s” corrections are not of the right sign to 
agree with the facts, and that further no constant value of p would give good results even for fairly 


high values of n, i.e. it is probable that the term in 5 in D? is of equal importance with that in 


t+ “The Probable Error of a Mean,” Biometrika, Vol. v1. pp. 1—25, more especially pp. 4, 6, and 
8 to 10. ; 
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yb 


| 
i} 
} 
i} 
' 
| 
i} 
1 
! 
1 
) 
U 
ie) P a 
OP =mode=, O0Q=mean=2. 


To obtain this approximation to (v) let us assume & =>+ e, and suppose e 


small. Expanding log y we find: 
log y = log y, +(n — 2) log 3S —4n(S/o) + 


eo. -, n Ee 


n-2o¢? 


= 


ne n—2 6? : 
= @ + 5 )+ terms in e% 
2c? n > 


Hence since > is at our choice we will take it so that 


and thus: 
n>? e 


ww _— 4 Ss 
Y=Yur2e ~ 7 @ * o*/(2n) + ete. 


Or, if ¢ be small compared with o, the distribution is the normal curve: 


2 


y=yle FR 


=| 
= 
om 
<— 
<. 
=: 
= 


=x) ce Ee 
with mean at aay rma: and standard deviation o/V2n. If n as usual be 


considerable, this agrees with the ordinary result, ie. Z=o and cy =o/V2n, the 
distribution being treated as normal. 


We will now deal with the full result (v). We have: 


é nz? 
M, = ySrds=y{ PAs al Pee CEP Rane pec n s ccc (viii), 
0 
and clearly M,’ depends on a knowledge of 
Ly= | ule du detasen indeocer aaa eee (ix), 
0 


for we have: 


P fon n+p—-1 
Mea cal PAs 
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Integrating by parts we find: 
Ly = (q—1) Ly-2 
...d.1 L,, if g be even 
(Ghee a q 


Mea ep pe sad: 
Now 


I= [e-¥du=,/5, 
0 2 
L, =| ue *” du =1, 

0 


thus M,’ is determined, and will depend on whether 7+ p be even or odd. 


poh cn n+1 ie oe n+1 J 
But My = Yo a Ln = Yo C (n-1) Ens, 


ims go \?-1 
M, =H Yo (=) Una 


—l1 
pe = M,'/M)' = o° ; n 


and 


Hence 


To find the modal value } we must differentiate (v), and we have 


which gives 


a result in agreement with the mean > of the approximate solution (vi), as we 
should anticipate. 


It now remains to find WM’ and WM,’ absolutely 


on ea 
M, =Y% \J/n HRS 


n—1 
My = % Ga Lino 
Suppose n even, then 


Lya=(n—2)(n—4)...2x1 
TEN PSN ase 1 vA PT ee TL ee (x11). 


2; 
Hence for n even 


3 p= MM = oe n—-2n—4 VE 


eG eG eal we ttteesaeeees (x1V A). 
Again for n odd 


Ln = (n — 2)(n— 4)... 1 x Ve 
Ins =(n — 8) (n—5)...2 x1, 
and .hence 


_ o (n—2)(n—4)...1  /o 
ReaD Bel fe 


14) 
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These accurate values of =, the mean standard deviation of samples, were first 
given by “Student” (loc. cit. p. 8). Now by Wallis’ Theorem 


Ny Gea £ product-of even numbers up to 2n 
gv" ~ product of odd numbers up to 2n —1° 


Thus (xivA) for n large tends to become 


SS ae eeu 
Vn He 5 me 


Vaio N° 


These values, however, really only suffice to show the approach of > to o, as 


M 
II 


and (xivB) 


they depend on the neglect of terms of the order * as compared to 1, and we should 


get absurd results for os? by subtracting the square of the above values of > from 


#2 in (xi). All they really tell us is that for n large & =o, but they give no true, 


approximation in a 
If we use Stirling’s Theorem up to the third termt, Le. 


a 1 
wl= Bmore (1+ 55+ Me 


12% 288.2? 
é = 3} (foes 
we obtain L=a (1 Teamie 39,8) esi eee (xv), 
2 1 
oc? = om (1 — a) | deed telatne oe eee (xvi), 
ape ae 
M3 = ane? ie ee 
Bar nomhouldet lled’to antreducouNeut _ 189" Sito, Sunnie 
ut we snou e compe ed to introduce e term — 5184025 g 


expression to reach the second terms in p; and «4. As we have indicated (p. 523, 
ftn.), such a term, even if used, will not lead to profitable results. It is better 
to work with the full formulae. It is desirable to find the full third and fourth 
moment coefficients in order to determine @, and 8, and so measure as 7 increases 
the rapidity of approach to the normal curve. 


* « Student” has used an extension of Wallis’ Theorem, which will suffice for certain constants only. 
+ We can write (xiv a) 


< o gin-1 ln —1)? 
>= ( [dese J: Fis aeteeseet cuneate en er ee eeeee (xvii), 
Jn |n-2 T 
and (xivB) 
PO: (n-2)[n-38 
= el ™ ee 
Jn (g-3) | 3 (m ios 3))2 ie ig ieielsleieieleltis oloieseleieleteeieivieleleieieintelersteteletete (xviii), 


and then apply Stirling’s Theorem. 
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We have: f é 
N+3 N+3 
May a Lins = Yo (5) (n+ 1) Ly 
= = (n + 1) M,’. 
2 ’ 4(n2—1 : 
Hence ps = Mi /M, = = (n+1) pe = ae ee eros taste (xix), 
n+2 N+2 
M; = Yo ca Lnia = Yo a ) MEn- 
= Ss nM’. 
n 
Hence |g EG DLE ope) RRR a eT reer eR OED (xx). 


Transferring to mean: 
b= bs. a 2a fer. a fe [ny 
SS GnnY (1 — 2¢3°/o? — jae *) 


S a (1 z =e eae an Ga 


Thus yp; will grow small, not only owing to the factor z but because oc? tends 


to equal o?/2n as n increases. 


i os? \? 
eae o>: ( iz OF 
Now Bi mr bs? i. n2 os! i 


Or B= 8n 2 (1 - Set aoa) Shoes ay Semana (xxil). 


Here =/o is of the form 1— 2S and 


pee (ay aXe 
2259} ( 2) ‘ 
and thus @, tends as n increases to take the form 8y,?/n, but as y. may be a 


considerable numerical coefficient 8y,” may be commensurable with n till n is very 
considerable. 


We next turn to «4, and shall endeavour to express it in terms of u.=<a3?. 


Since fa! = fy + 3? = 0° (1 : 7) 
by (xi), we have 
= 1 
S=o(1 --) — Mo. 


N n 
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Further by (xx) 


Hepa = 07)? = a4 ¢ 2 5) — ofa. 
Thus 


b= on x Apes’ by’ + 6 plo’ by” = 3 p74 


aati =~ 4(1- 7) +4846 (1-7) ( --- 4) 
nN nN oO 11) n Oo 
2 2 
~3(1--) +B (1-7) -3 3h 
nN o~ nN oC 


(5 1 3 os? 3 cs? \2 
=o eee oon vee fae Neue ie i 
2 [4r2 2n (4 =| (1 aa) An? (2 Tan BdoSodo00 


Hence 


By = 


eaieiar ? sae (4 i ") € i oa) 8 (1 = | é.<( XX111)); 


Our results for ws and 4, although expressed in other notation, are in 
accordance with “Student’s” (loc. cit. p. 9), so also are our results (xv) and (xvi) 
although reached by a different method of approximation. We do not agree with 


his approximate values for us, ws, or 8, and f,. 


The calculations to find S/o, os/(c/s/2n), 8, and B, presented some trouble. 
In order to be correct to the four figures of decimals in the tabled results, tables 
of ten-figure logarithms had to be used in the logarithmic part of the work. 
Formulae (xvii) and (xviii) of the ftn. p. 526 were adopted, using Degen’s Tables of 
the Logarithms of Factorials. /,' was calculated to nine figures, and even then, 
as n became large, the determination of the antilogarithms presented consider- 
able difficulty. Further the powers of 1 — os?/(o?/2n) gave rise to trouble. The 
numerical work was undertaken by Ethel M. Elderton and Beatrice M. Cave, 
to whom very hearty thanks are due. We think the results may be depended 
on to the figures tabulated. 


It will be seen that by the time n =50 the mode is as close to the mean as we 
should expect to find in any random sample of normal material; the average 
mean > is only 1°5°/, from the usually adopted value o, and the average standard 
deviation cy only 0°3°/, from its customary value o|V2n. Further 8, and f, are 
0105 and 3:0003 respectively, or for all practical purposes have reached their 
normal values. We think it must be concluded that for samples of 50 the usual 
theory of the probable error of the standard deviation holds satisfactorily, and 
that to apply it for the case of n= 25 would not lead to any error which would be 
of importance in the majority of statistical problems. 


On the other hand, if a small sample, n < 20 say, of a population be taken, the 
value of the standard deviation found from it will be usually less than the standard 


n~ 


deviation of the true population. If we take the most probable value, >, as that 
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which has most likely been observed, then the result should be divided by the 
number in the column entitled mode &/o to obtain the most reasonable value for a. 


For example, if } be observed, and n = 20, then the most reasonable value to give o 
is /°9487. 


The paper by Mr Fisher and the accompanying table more or less complete 
the work on the distribution of standard-deviations outlined by “Student” in 
1908. 


Table of Values of the Constants of the Frequency Distribution of the Standard 
Deviations of Samples drawn at random from a Normal Population. 


caer Measures of Deviation from 
eon Standard Deviation Normality 
Sample Mode Mean = 
n S/o Z/o 
oy/o o3(o/V2n) Skewness By Bo 
4 ‘7071 ‘7979 ‘3367 9524 2696 | -2359 3°1082 
5 "7746 *8407 *B052 ‘9651 "2168 ‘1646 3°0593 
6 *8165 “8686 *2808 9725 °1857 "1255 3°0370 
7 "8452 "8882 "2612 ‘9774 "1648 | “1011 3°0251 
8 “8660 "9027 "2452 “9808 "1495 "0845 3°0181 
9 *8819 “9139 ‘2318 "9834. W188 je" -O725 3°0136 
10 "8944 9227 2203 "9853 *1285 0634 3°0106 
11 "9045 ‘9300 "2104 ‘9868 *1209 0564 3°0085 
WD "9129 "9359 ‘2017 “9881 "1144 ‘0507 3°0070 
13 "9199 *9410 “1940 “9891 "1088 0461 3°0059 
14 "9258 "9453 1871 “9900 1041 0422 3°0049 
15 “9309 "9490 “1809 “9907 “0998 ‘0390 3°0042 
16 "9354 9523 1752 "9914 “0961 ‘0362 3°0036 
17 *9393 “9551 ‘1701 ‘9919 0927 0337 3°0032 
18 9428 ‘9576 "1654 9924 ‘0897 ‘0316 3°0028 
19 "9459 “9599 ‘1611 “9928 ‘0869 ‘0297 3°0025 | 
20 "9487 ‘9619 *1570 "9932 0844 ‘0281 3:0022 | 
25 "9592 ‘9696 "1407 "9948 *0745 “0219 30014 | 
30 ‘9661 ‘9748 "1285 *9956 ‘0674 ‘0180 3°0009 | 
385 *9710 "9784 1191 "9963 0620 0153 3°0007 | 
40 ‘9747 ‘9811 1114 *9967 0577 0132 3°0005 
45 ‘9775 "9832 Odi 9977 0541 ‘O117 3°0004 
50 9798 "9849 ‘0997 ‘9974 0512 “O105, 1) 3:0003) | 
58 “9816 9863 ‘0951 ‘9977 "0488 ‘0095 3°0003 | 
60 "9832 ‘9874 0911 | -9979 ‘0467 =| +0087 3°0002 | 
65 "9845 “9884. ‘O875 “9980 ‘0447 “0080 3°0002 
70 "9856 "9892 0844 | 9982 0430 0074 | 3:0002 
75 ‘9866 ‘9900 0815 | -9983 0415 ‘0069 | 30001 
80 ‘9874 "9906 0789 | -9984 | -0402 0064 == 30001 
85 “9882 9911 ‘0766 "9985 "0389 70060 = 30001 
90 ‘9888 ‘9916 0744 | -9986 | 0378 ‘0057 3°0001 
95 "9894 “9921 0725 | ‘9987 03867 =| «0054 | 3:0001 
100 "9899 "9925 ‘0706 "9987 0358 | ‘0051 3°0000 


TUBERCULOSIS AND SEGREGATION. 
By ALICE LEE, DSc. 


(1) In his book The Prevention of Tuberculosis (London: Methuen, no date 
on the issue we have used) Dr A. Newsholme has examined the influence of 
segregation on Tuberculosis. This is the topic of Chapter xxxv. In the opening 
of this chapter, he writes: 


The exact measure of institutional segregation of phthisis is the ratio stating how many of 
the total days’ of sickness (number of patients and number of days of sickness) are passed in 
institutions. This ratio and the equivalents for it which have to be used in practice may 
for convenience be called the segregation ratio. The need for equivalents for the ratio as stated 
above arises from the fact that we are dealing with actual recorded experience, and the material 
has to be taken from the records as they happen to exist. (p. 266.) 


After noting the incompleteness of the records, Dr Newsholme continues : 


It becomes necessary therefore to select other figures which vary approximately with the total 
days of tuberculous sickness and the total days of tuberculous sickness passed in institutions. 
(p. 266.) 

We shall discuss below what “indirect measures of segregation” Dr Newsholme 
selects, but he gives the following most proper caution with regard to them: 

In using these indirect measures of institutional treatment of tuberculosis and of its pre- 
valence it must be remembered that they are indirect and approximate. Thus, for instance, 
figures for institutional treatment usually give the number of cases and not days of treatment, 
and while they tell how many people were segregated in institutions do not’ show the average 
duration, still less the quality of the treatment. Any of these indirect forms of segregation ratio 
has therefore to be verified wherever possible by the application to the same community and 
period of one or more other forms of the ratio, and checked wherever practicable by a special 
examination of sample constituent communities whose figures are included in the total. (p. 268.) 


Dr Newsholme in the course of his chapter gives a number of very high 
correlations between the phthisis deathrate and the indirect forms of the segre- 
gation ratio he has selected, and he interprets these as well as a long series 
of graphs as demonstrating that institutional segregation has been a most 
important factor in the diminution of the phthisis deathrate. Now any two 
variates which are changing continuously with the time—say, the consumption 
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of bananas per head of the population and the fall in the birthrate—will exhibit 
high correlation and will show graphically very high association, if plotted to 
appropriate scales and on a common time basis. Until the time factor has 
been removed, either by partial correlation or otherwise, it would be most un- 
wise to interpret such cases as providing any causal relationship. 


It seemed accordingly worth while to reinvestigate Dr Newsholme’s problems 
with the aid of a rather more adequate statistical apparatus. 


(2) We must frankly confess at the outset that we have had great difficulty 
in following Dr Newsholme’s description of the methods he has adopted to 
measure the amount of segregation. His charts do not seem always in ac- 
cordance with his tables, and both are occasionally out of agreement with his 
definitions. As he does not give the raw data on which his correlations are 
based, but only condensed versions of them in his tables and graphs, it is 
impossible to test his conclusions without returning to the original sources, 
which are not always stated, and when we have found them and our results 
differ, we are unable to say whether the difference is due to failure in his or 
in our arithmetic, or to divergences between his and our records. 


Dr Newsholme uses in all some six measures of the segregation ratio, four 
intentionally and two apparently by inadvertence. 


Let P= total population of a given area, @ = the total number of annual deaths 
from phthisis. Then ¢/P multiplied by 10,000 or 100,000, as the case may be, gives 
the crude deathrate from phthisis. Let D; be the deaths from all causes which 
occur in institutions and D the total deaths in the same area, then 100D,/D is 
Dr Newsholme’s first approximation to the segregation ratio*. On p. 270 he 
gives two tables which show in (a) England and Wales as a whole, (b) in London, 
that, while in the course of forty years 1000¢/P has practically halved, 100D;/D 
has practically doubled. The data, Dr Newsholme tells us, show “not only a 
very close correspondence between the increase of total institutional segregation 
measured by the ratio in question and the decrease of phthisis, but an even more 
striking similarity in the ratio at which these changes have occurred” (p. 271). 
This is illustrated by a graph on p. 271, in which the logarithms of the phthisis 
deathrate are plotted to time against the logarithms of the indices of institutional 
deaths to all deathst. We do not know why Dr Newsholme has chosen this 
method of representation; it certainly, with his choice of scales, makes the two 
curves roughly parallel, but this does not demonstrate the “similarity in the ratios 
at which these changes have occurred.” For, if the actual values be plotted to 
the time, the curve of phthisis deathrate is conver and the institutional death- 
rate concave to the time axis, in other words while the rate of one is increasing, 


* The assumption made appears to be that for the period in question D; is proportional to the 
institutional deaths from phthisis,—a very big assumption. 

+ The logarithms of the ratios of institutional deaths to all deaths appear to be either wrongly 
plotted or wrongly calculated. 
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the rate of the other is decreasing during the period in question,—always on the 
supposition that we plot the results as Dr Newsholme has done with reversed 
directions of increasing scales for the two indices. He states that “the experience 
is summarised in the high correlation coefficients of ‘91 for England and Wales 
(1878—1903) and ‘90 for London (1866—1904)” (p. 271). The correlations 
found from his actual tables do not appear to agree with these, being, for example, 
— ‘93 for England and Wales with the negative sign as we should anticipate; but 
as Dr Newsholme does not give the same years for his correlation coefficients as 
in his tables, he may have worked out his coefficients for individual years. It is 
impossible to test the matter, as neither the figures nor their source are provided. 


If, however, we take his Tables LXII and LXIII, and apply the variate 
difference method* to Dr Newsholme’s data as they stand in his book, which 
are all the data available, we find 


Correlation of Phthisis Deathrate and Ratio of Deaths in 
Institutions to Total Deaths. 
England and Wales: Third Differences — 174+ 293, 
London : Second Differences —:094+:'252. 


In other words the data show no significant relationship between this measure 
of segregation and the phthisis deathrate, when the time-factor is annulled, even 
with the early differences. It is impossible to press the matter further because 
the data are far too sparse for difference treatment, but the results, such as they 
are, are sufficient to indicate that Dr Newsholme’s high correlations are solely due 
to the fact that both variates are continuously changing with the timef. 


(3) As a second measure of segregation Dr Newsholme takes 100¢;/6 and 
1000¢/P is then correlated with this, ¢; being the deaths from phthisis in 
institutions. On p. 275 Dr Newsholme gives very meagre data for Brighton, 
Sheffield and Salford in groups of years, six pairs of values for Sheffield, five for 
Brighton and four for Salford. It is thus impossible to test these for annulment 
of the time-factor, and no references are given to the sources of the original data. 
On p. 276 we read: 

Coefficients of correlation summarising this correspondence for long series of single years 
work out at 67 for Salford from 1884 to 1904 and ‘80 for Sheffield from 1876 to 1905 f. 

If the arithmetical values be correct, they should certainly have negative signs, 
but even then they would not demonstrate anything but the increasing use of 
institutions and the decreasing prevalence of phthisis during the years in question. 


* Biometrika, Vol. x. pp. 179, 341. 

+ These values might be modified if we could go to higher differences, but this is impossible on the 
very limited data which Dr Newsholme provides. On these data all we can state is that no evidence 
of organic relationship between the variates, such as is asserted by Dr Newsholme to exist, can be 
demonstrated. 

} There is no statement as to why Brighton has been omitted. 
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There is, however, a much graver criticism to be made of Dr Newsholme’s 
method in this measure of segregation. He proposes to correlate 


1006/6 and 10004/P, 


and interprets the high correlations as a sign of the value of segregation in 
reducing the phthisis deathrate. We have not his data to test his conclusions by, 
but we can compare them against certain results for 88 years in (i) England and 
Wales, (ii) Scotland, and (111) Ireland. Here they are: 


Correlation of Phthisis Deathrate and Ratio of Institutional 
Phthisical to all Phthisical Deaths. 


Years District Correlation 
1866—1903 Scotland —°9815 + 0040 
1866—1903 England and Wales — ‘9750 + 0054 
1866—1903 Treland — 8720 + 0262 
1876—1905 Sheffield — ‘80+°0443 
1884—1904. Salford — 67+°0811 


The reader may imagine in this table a confirmation of Dr Newsholme’s 
results, for the larger material gives higher values of the correlations. On the 
contrary, these correlations have been obtained by taking as the measure of 
segregation the ratio 
Mean Institutional deaths per annum from phthisis 1866— 1903 


eee Annual Total deaths from phthisis 


Now it is clear that this index never varies with the increasing percentage 
of institutional deaths from phthisis. Yet all the correlations are greater than 
Dr Newsholme’s! We have little doubt that he would get higher values than he 
has done, if he replaced the actual institutional deaths per annum by the constant 
mean value. In other words the results reached by him are of no significance, 
for we get higher correlations by putting a single fictitious value for the annual 
institutional deathrate. 


The real source of his result is not the strong influence of segregation on 
phthisis, but the spurious correlation introduced by using the phthisis deaths, ¢, 
in the numerator of one variate, 1000¢/P, and in the denominator of the other, 
100¢;/¢. Thus no scientific results of value can be found from Dr Newsholme’s 
second measure of segregation. 


In discussing this second measure of segregation, Dr Newsholme lays great stress 
on the part played by asylums for the insane in segregating the tuberculous. He 
notes that the percentage of lunatics treated privately with relatives and others was 
18:4 in 1859 and fell to 5°5 in 1902, thus marking increasing segregation during the 
period of fall in the phthisis deathrate. He states (p. 274) that: “the deathrate 
from tuberculosis in borough and county asylums in 1901 was 15°8 per cent. of 
the inmates, and over ten times as great as in the general population.” Now 

68—2 


534 Tuberculosis and Segregation 


Dr Newsholme’s figure appears to be quoted from the 56th Annual Report of the 
Commissioners in Lunacy, and in this case it should read 15°8 per 1000 and not 
per 100, and although Dr Newsholme appears to have made a similar slip in 
dealing with the deathrate in the general population, he seems to be comparing 
deaths from all forms of tuberculosis among the insane—some of which have 
possibly a direct relation to their insanity—with deaths from phthisis alone in the 
general population. Further he has made no allowance for the very marked 
difference between the age distributions of the two groups he is comparing. 
The difference is so great that a phthisis deathrate of 1:46 per 1000 in the 
general male population is equivalent to one of 2°41 per 1000 among the insane 
population of males. Even if the corrected deathrate among the insane for 
phthisis were ten times its magnitude among the sane, we fail to understand 
what Dr Newsholme means when he asserts that: “the segregation of each 
tuberculous lunatic has been equivalent to the withdrawal of ten ordinary tuber- 
culous persons” (p. 274). Because tuberculosis among lunatics is ten times as 
frequent—judging by deaths, and accepting for the purpose of argument Dr News- 
holme’s figures—why should the isolation of one tuberculous lunatic be equivalent 
to the withdrawal of ten sane tuberculous persons? That must suppose a tuber- 
culous lunatic capable of spreading ten times the infection of a tuberculous but 
sane individual. All Dr Newsholme could say would be that from the standpoint 
of segregation it is ten times more desirable to segregate any lunatic, than 
any sane person, for the former is ten times as likely to die of tuberculosis. 
Dr Newsholme brings no evidence to show that the individual tuberculous 
lunatic is ten times as dangerous as the individual tuberculous sane person. 
As a matter of fact we still need very careful investigation of the relation of 
lunacy to tuberculosis, not only having regard to some forms of tuberculosis as 
possible sources of feeble-mindedness, if not of insanity, but also having regard 
to whether the old idea of asylum segregation as a possible cause of the spread 
of tuberculosis among lunatics is wholly erroneous, and we might further examine 
whether the new idea that the majority of tuberculous lunatics were tuberculous 
on admission is in its turn wholly sound*. In the present state of our know- 
ledge we think the assertion that the increased segregation of lunatics has 
substantial relation to the decrease in the phthisis deathrate is quite unproven. 


(4) Dr Newsholme’s third approximation to the segregation ratio is the 
index 100p,/p, where p; is the number of paupers in institutions and p, is 
the total number of paupers, indoor and outdoor. Unfortunately Dr Newsholme’s 
usage does not agree with his definition. The index he appears to use is generally 
100p,/p;, and the values of this are given in the last column of Table LXV 
(p. 277) and Table LXVII (p. 279). In Table LXVI (on p. 277), however, the 
100 factor is dropped and p,/p, again used in the heading to the central column, 


* Many lunatics enter and re-enter asylums, it does not follow because they died of tuberculosis 
and were tuberculous on last admission that their tuberculosis was there on first admission. 
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although the figures in that column appear to refer to 100p,/p;. Below this 
table occur the words: 


This experience for the entire series of individual years is expressed by a coefficient of 
correlation of — ‘94 between segregation measured :by the fraction of pauper population treated 
in institutions and the phthisis deathrate. (p. 277.) 


The correlation to support Dr Newsholme’s views should be negative if 


100p;/p, 
has been used, and positive if 100p,/p; has been used. But as many of his 
other correlations are given with the wrong sign, it is difficult to discover what 
measure of segregation he actually has used. To add to the confusion the index 
actually plotted is log p,/p;, and not 100p;/p,, which is what Dr Newsholme 
defines as his index. We have accordingly in our analysis of the figures, to be 
given later, used both indices 100p;/p, and 100p,/p;. 


It is very difficult to appreciate how the ratio 100p,/p, can effectively measure 
the segregation ratio—it is indeed impossible to agree with Dr Newsholme’s view 
that any of his indices “ measure with approximate accuracy the ratio which states 
how many of total days of tuberculous sickness are passed in institutions.” 


The policy of compelling as many paupers as possible to go into the workhouse 
was directly adopted with a view to diminishing the total pauperism as well as 
abuses connected with outdoor relief, and that policy is the source of increase in 
the index 100p;/p,._ Had Dr Newsholme examined his own Tables LXV, LXVII 
and LXIX carefully, he would have seen that the percentage of indoor paupers 
on the general population has remained almost constant for the period in question, 
while the total paupers per cent. of the general population in England with Wales 
and in Scotland have decreased. If the same relative number of paupers are segre- 
gated now as formerly, how can this segregation have diminished the chances of 
infection in the community? We can hardly assume that all paupers are tuber- 
culous, or markedly so relatively to other men, so that the reduction of the number 
of outside paupers by indoor segregation is equivalent practically to a reduction 
pro tanto (note the extraordinarily high correlations !) of the number of tuberculous 
in the community. If so, then the reduction of the tuberculous deathrate would 
be due not to the segregation, but to the large decrease in the total pauperism 
relative to the population of this country. The correlation, as we shall demon- 
strate, is not between the segregation of paupers and the phthisis deathrate, but 
between the diminution of total pauperism and the phthisis deathrate. We shall 
investigate how far this relationship between total pauperism and the phthisis 
deathrate is “organic,” i.e. continues after the annulment of the time-factor, or is 
purely due to the fact that both pauperism and phthisis have diminished during 
the forty-year period under consideration. 


It was this third definition of a segregation ratio in conjunction with the 
fourth segregation ratio to be considered later that led us to realise that the whole 
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problem must be dealt with afresh, and the modern methods of partial correlation 
and variate difference correlation applied to its various aspects. We have taken 
the period used by Dr Newsholme, 1866-1908 inclusive, and have used the figures 
for each individual year thus obtaining 38 entries, which are few indeed, but the 
best we can probably do with data of this kind, and therefore directly comparable 
with Dr Newsholme’s results, for he seems to have used individual years for his 
correlations although he does not always say so (cf. pp. 271 and 280), and notwith- 
standing that his tables are all given for five-year periods. 


The population numbers for England and Wales (Table A) were taken from 
the Registrar-General’s Annual Report for 1909, and the phthisis deaths from the 
Reports for 1866-1903; the average of each five years’ period agrees with Dr 
Newsholme’s values for phthisis, but the values for indoor and for total paupers 
do not quite agree with his. Dr Newsholme was therefore written to and asked 
whence he obtained his numbers. He was kind enough to reply, but said that 
he was unable to refer at the moment to the original tables, but that undoubtedly 
the data were the statistics given in the Annual Reports of the Registrars-General 
for England, Scotland and Ireland. We then examined the Local Government 
Board returns and found that Dr Newsholme apparently had used the pauper 
returns for the January quarter of each year. We kept therefore to the Registrar- 
General’s Report, as the numbers there given are based on the Local Government 
Board’s returns for the whole year, which are a fairer measure of pauperism than 
those for the January quarter alone. 


For Scotland, our numbers (Table A) agree with Dr Newsholme’s for both 
phthisis and indoor paupers, except when we take the first five-year period 
(1866-70), where they differ slightly. In the case of total paupers for the periods 
1866-70, 1881-85, and 1896-1900 our figures do not agree*. We cannot find 
any reason for these divergences except a slip in his or our arithmetic, or the 
possibility that a wrong number of outside paupers has been taken by one or other 
of us. We do not think the differences in the values are such as to invalidate 
a comparison of results. 


In Ireland the only serious discrepancy in our values is in the total number 
of paupers for the period 1876-80. 


These discrepancies, however, emphasise the very necessary rules for statistical 
treatment: (1) that the ultimate raw data should be published with every inquiry, 
and (11) it should be stated exactly where they are taken from, and how they have 
been treated. 


Table A gives our raw data, Table B our deathrates and indices based thereon. 


We have correlated the phthisis deathrate taken as 10°¢/P with 100p;/p, and 


* We are unable to compare his and our data for individual years, because Dr Newsholme has only 
published his data for five-year periods. 
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100p,/p;. Taking first England and Wales, and calling these three indices 
respectively Jy, J; and J,, we find: 


Correlation of J, and J; = — 9664 + 0072, 
I, and J, = + 9298 + 0148. 


Dr Newsholme gives — ‘94 as the coefficient of correlation “between segregation 
measured by the fraction of pauper population treated in institutions and the 
phthisis deathrate” (p. 277). Having regard to his confusion of J; and J, and 
his frequent interchange of the signs of correlation coefficients, we can only say 
our results confirm his high numerical value, but not his actual figure. 


But does this actual figure mean that there is any real relationship between 
segregation and the phthisis deathrate? To test this, we replaced the index J; by 
I,, where 


Mean number of indoor paupers per 10° for the population, 1866-1903 
I, = 100 — 
10° x p,/P 


-100 (5)/(). 


In this index the relative number of indoor paupers is assumed to remain 
absolutely constant. We found: 


Correlation for England and Wales of J/g and J; = —‘9459 + ‘0115, 


that is to say we get substantially the same value, a value higher than 
Dr Newsholme’s, by putting the number of indoor paupers relative to the 
general population constant throughout the period. It is very difficult, in the face 
of such a result, to suppose that segregation of paupers has anything whatever 
to do with the diminution of the phthisis deathrate. It is clearly due to a 


; - F F 1 ae 
negative correlation of a high magnitude between —, and ¢/P, or to a positive 


(Me 


correlation between Br and p i.e. to a correlation between a high total pauper 
g paup 


Ie le 
rate anda high phthisis deathrate. Dr Newsholme’s result merely reduces to 
the statement that total pauperism in England and Wales has diminished con- 
temporaneously with phthisis. If the result has nothing to do with segregation, 
can we assert that the reduction of phthisis is causally related to the reduction in 
total pauperism ? 


Overlooking for a moment a new objection to be raised later, let us apply the 
variate difference method to the correlation of ¢/P with 100p,/p, and 100p,/p; 
in the cases of England with Wales, of Scotland, and of Ireland; also to the 
correlation of ¢/P with the index 100 (p;/P)/(p,/P) in the case of England with 
Wales. The following are the results: 
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TABLE I. 
England with Wales Scotland Ireland 

Correlation of 10°¢/P 10°$/P 10°¢/P 10°¢/P 10°¢/P 10°¢/P 10°¢/P 
with elas ateie i I; Iie I; I, I; I, 
Crude Indices | — 946 + 012 | — 966+ ‘007 |+°930+ 015 | — 952+ -010 |+ ‘920+ :017 | — °881 + 024 | + °893 + :022 

Ay + 090 + °134 | — 258+ °126 | + 340+ °120 | — 265 + °126 |-+ °250+ °127 | — 280+ °125 | + -235 + 128 

A, — 2014 °149 | — 4614123 |4+ °542+4°110 | — -2404°147 | 4+ °1824°151 | — 2644 °145 | 4+ °180+°151 

A3 — 335 + °153 | —-508+4°127 |+°567+°116 | —:205+ :164 | + 086 + 170 | — 226+ °163 |+°162 + °167 

Ay — 407+ °155 | — 518+ °136 |+ °547+°130 | —°186+°179 |+ 024+ °185 | — 182+ 179 |+°133 4-182 

As — °475 + 7153 | — 528+ °143 | + 529+ °142 | -— 1824-191 | -—-003+°'198 |— 145+ °194 |+ 108+ °195 

Ag — 538+ °149 |— 5434 °147 |+ °5834°151 — — "112+ °206 |+°081 + °208 | 

Ay — 584+ °145 |— 5624 °150 |+ 539+°156 —_— —_— — +°044+°219 

As — 614+ °143 | —°587+4°151 |4+°557+°159 —_— — — — 004 + :230 


It will be seen from this table that whether we use the index J; or its 
inverse I,, we get practically the same results—naturally with changed sign. 
But the results themselves are of extraordinary interest. For both Scotland 
and Ireland, when we proceed to annul the time-factor by correlating successive 
differences, we find that the high correlations interpreted by Dr Newsholme as 
marking a relation between pauper segregation and phthisis deathrate entirely 
disappear or become less than their probable errors. There is thus no organic 
relation between these variates as measured by the above indices. In the case 
of England and Wales, however, while there is a reduction on annulment of the 
time-factor to roughly two-thirds of the high value noted by Dr Newsholme, 
this value does not tend to disappear with increasing differences. Thus in 
England with Wales, as apart from the remainder of Great Britain, there would 
at first sight appear to be an organic relation between segregation of paupers 
and the phthisis deathrate. But our first column under the England with Wales 
section shows that if we fix the percentage in the general population of these 
indoor paupers and then annul the time-factor, we reach a slightly higher value 
of this apparent organic relation. It has therefore nothing to do with segregation. 
Thus Dr Newsholme’s interpretation of his original high correlations appears in 
every case fallacious. 


There are two methods of testing this result, ie. the absence of organic 
relationship between indoor pauperism and phthisis. Suppose we correlate the 
crude numbers of phthisis deaths per annum and of indoor paupers per annum, 
the resulting coefficient will have very small logical value because both these 


variates are continuously changing with the time*. But now suppose we annul 


* It is noteworthy that the England with Wales and the Scotland correlation coefficients for these 
crude variates are high and negative, but for Ireland the coefficient is moderate and positive. Thus 
the factors at work must be totally different in the two Islands. Since indoor paupers relative to 
the population have remained singularly constant the increase of phthisis deaths must have been much 
slower than the population increase in Great Britain, but somewhat faster in Ireland. 
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the time-factor by correlating the differences of these variates, then we shall free 
ourselves from the influence of the time-variate, and in doing this we shall also 
free ourselves practically from the influence of change of population, which is a 
time change. 


The following table resulted from this investigation. 


TABLE II. 
Correlation of Crude Phthisis Deaths (b) and Indoor Paupers (pi). 
TL 
Variates England with Wales Scotland Treland 
Crude — 9384+ 014 —°718 +053 +°457 + ‘086 
Ay — 376+°116 — 206 +°130 — 092 +°134 
A, — 302 +°'141 — 2194148 — 1038 +:154 
As — ‘2134164 — 180 +°166 — 143+°168 
Ay — '100+°183 —'157+°181 —'147+°181 
A; — 016+°198 —°158+°'193 — 140+°194 


It will be seen that for all three countries, whether we start with the positive 
correlation of the Irish or the negative correlation of the English and Scottish 
returns, there is no remaining significant correlation after annulment of the time- 
factor between indoor pauperism and phthisis. 


A second method of verifying our conclusions is to find the partial correlation 
between indoor pauperism and phthisis deaths for a constant value of the total 
population and a constant value of total pauperism. We thus ask the question 
whether with a constant population and a constant amount of total pauperism, 
an increase of indoor pauperism would organically affect the number of deaths 
from phthisis. By making the population and the total pauperisin constant we 
are largely producing an annulment of the time-factor and ascertaining whether 
a change in the number of indoor paupers due to causes other than temporal 
influences the number of deaths from phthisis. 


The system of correlation coefficients given in Table ILI, p. 540, was determined : 
Here the values of py 7», for England with Wales and for Scotland confirm 


the conclusions we have reached by other methods, ie. there is no significant 
relationship at all between phthisis and indoor pauperism. The value for Ireland 
is, perhaps, significant, but having regard to its smallness (—‘3 +'1) and the size 
of its probable error, no one can lay real stress on it, in opposition to the results 
of the other two countries. In general the coefficients for the Irish data appear 
very anomalous, and certainly divergent from those for Great Britain. 


Thus our investigation of the relation between indoor pauperism and phthisis 
appears to be entirely opposed to Dr Newsholme’s conclusions. We find the 
segregation of paupers to have no substantial influence on deaths from phthisis. 
The one outstanding point at present, the relation between p,/P and ¢/P after 
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annulment of the time-factor (see our p. 538), has no bearing on the segregation 


problem of Dr Newsholme. 
TABLE III. 


Total and Partial Correlation Coefficients of Crude Numbers of Indoor and 
Total Paupers (p; and p,), Total Population (P), and Phthisis Deaths (¢). 


Coefficients | England with Wales Scotland Treland 
"pio — 9325 014 — 718 + 053 +°457 +087 
(7 
"yp +°955 +010 +°831 4-034 +°763 + 046 
uv 
We Ok g ‘ > 1 AS : q 
Total "op 950+ 011 — 896 +022 +:479 + 084 
Coefticients ie —°544+:077 — +528 +079 — 251+:°103 
(ai 
"bp +577 + 073 +°780 +043 +:070+:109 
7. 
"py P — 674 + 060 — 805 + 038 — 684+ -058 
Pints — 287 +°100 +111 +°108 +:162+°107 
u 
Partial p,! po 75+:073 4+:492+ 
Coefficients eho 5a A ie i 
Py! pio alee as CS —-017 +109 — "305 + 099 
Pr pi 


To approach nearer to the meaning of the relation between total pauperism 
and phthisis we determined the correlation between p, and ¢@ for constant P, and 
found 

Pp 6= — 277 £101, 
which is barely significant having regard to its probable error. 


Now after elimination of the time-factor, we found for the correlation of ¢/P 
and J, at the eighth difference —‘614 +143, but this is the same as the corre- 
lation of oe 
significant, positive and of the order ‘6. Now if p, and ¢ after the removal of the 
time-factor were practically independent of each other, there would be a high 
positive correlation between p,/P and ¢$/P, due to the fact that P when it 
takes—after annulment of the time-factor—any random deviation appears in 
both variates’ denominators. In other words, we are inclined to believe that 
the high negative correlation between ¢/P and J, is solely due to spurious 
correlation arising from the nature of the indices used. 


and @/P. Hence the correlation of p,/P and ¢/P must be very 


To throw still more light on the matter we have investigated the correlation 
between the total number of paupers and the total number of deaths from phthisis 
when the time-factor approaches annulment. It will be seen from the table 
below that for both Scotland and Ireland there is finally no relationship at all 
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between phthisis deaths and total pauperism. On the other hand, England with 
Wales is tending to a value at least approaching to the crude correlation. We 
have therefore this noteworthy result: England with Wales starts with a con- 
siderable value and concludes with an equally great value, Scotland starts witb 
a high value and ends with a zero value, Ireland starts with an insignificant value 


TABLE IV. 
Relation between Total Pauperism (p,) and Deaths from Phthisis (@). 


England with Wales Scotland Treland 
Crude values +:°5774:073 +°780 + 043 | +°070+°'109 
Ai — 095 +134 +025 + 135 | +164 +°132 
Ay +°1744°151 +°025+°156 +144 +4152 
A3 +286 +°158 +:012+°172 | +131 £°169 
Ai 4.347 4-163 +033 £185 | +:110 £183 
As +413 +164 +027 +°198 | +090 +196 


and ends with an insignificant value. If pauperism were causative of phthisis, it 
is hard to believe that this would not manifest itself in the Scottish and Irish 
returns; these negative any such hypothesis. It would appear that there are 
essential differences in the treatment of pauperism in the three countries. I 
suggest, but I cannot demonstrate the view, that phthisis itself leads to pauperism 
in England, ie. that the relatives of the phthisical breadwinner more often are 
allowed to become paupers in England than in the sister countries. In other 
words, that the only organic relationship between pauperism and phthisis we have 
been able to discover may be due to a relatively harsh treatment in England of 
the dependents of the phthisical. 

To show how effectively the variate difference correlation method removes 
time influence, we may note that we correlated total population (P) with total 
pauperism (p,) and total phthisis deaths (¢) with total population by this method, 
with a view to ascertaining whether the relation between p, and ¢ would be 
modified, if we determined it for constant population. 


The following results were reached : 


TABLE IV. England with Wales. 


Total Population Total Population 
and | an 
Total Pauperism | Phthisis Deaths 
(P and p,) (P and ¢) 
Crude values — 674+ :060 — 950+ :011 
Ay +°457 + °107 — 039 + °135 
Ay — ‘016 +°156 — °205 + *149 
As — °022+°171 — 089 +°170 
4 — 031 4°185 + 002 + °185 
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Thus we see that apart from the time-factor there is no relation whatever 
between either pauperism or phthisis and population. In the relation between 
total pauperism and phthisis deaths, no further correction for population is 
needful than that obtained by the annulment of the time-factor as in Table IV. 
Table IV bis shows us that neither pauperism nor phthisis is organically related 
to population, although we might well have anticipated that greater density of 
population would influence pauperism and provide greater chances of infection, 
and so of deaths, in the case of phthisis. 


(5) We now come to Dr Newsholme’s fourth and last measure of segregation. 
It is “the ratio in which the number of paupers treated in workhouses and work- 
house infirmaries stand to the total number of deaths in the community ” (p. 276). 
In our notation this is p;/f, or as an index 100p,/¢. But in the figures actually 
given in Table LXV (p. 277), and headed Segregation Ratio, Dr Newsholme 
appears to be using 100¢/p;.. The same remark applies to Tables LX VIII and 
LXIX (pp. 280—281). Thus it is difficult to be certain of what Dr Newsholme 
intends to be taken as his fourth measure of segregation. In our discussion below 
we have used both 100¢/p; and 100p;/¢ to provide for both contingencies and 
to check our results. 


Unfortunately Dr Newsholme makes little attempt to justify either his third 
or fourth ratio as an approximate measure of segregation. It will be remembered 
that he has defined the true method of measuring segregation to consist in 
forming the ratio “stating how many of the total days of sickness (number of 
patients and number of days of sickness) are passed in institutions” (p. 267). In 
this fourth index of segregation he replaces phthisical patients in institutions by 
indoor paupers, and total of phthisical patients by total deaths from phthisis, 
dropping any question of the number of days of sickness. At the very least this 
seems to involve two assumptions, (a) either that all indoor paupers are phthisical 
or that for the period in question the proportion of indoor paupers who are 
phthisical has remained constant, (b) that for the period in question the number 
of deaths from phthisis has remained a constant fraction of the total number of 
cases of phthisis. It is difficult to see how, without such assumptions, such figures 
can “measure with approximate accuracy the ratio which states how many of the 
total days of tuberculous sickness are passed in institutions” (p. 267). 


Yet in another paragraph Dr Newsholme quotes with apparent approval the 
statement of Mr Fleming, who speaks of the “great change in the character of 
workhouse inmates during recent years....The able-bodied inmates are gone and 
the sick inmates have come” (p. 273). Such a statement is absolutely inconsistent 
with the assumption (a) above. 


To justify (b) we must assert that for the last fifty years of the nineteenth 
century there has been no change in efficiency of treatment in the case of tuber- 
culosis, for without this we cannot assume that deaths from phthisis are even an 
approximate measure of the number of cases (p. 267). The fact that the reduction 
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in the phthisis deathrate has been substantially different for the different age 
groups, and is especially marked in the case of children, seems to indicate that 
recovery, at least from puerile phthisis, is more frequent now than formerly. 
However, not to spend more time on these assumptions—which, it appears to us 
that Dr Newsholme has by no means justified—let us examine whither this fourth 
method of approximately measuring segregation leads us. Table V gives the 
necessary coefficients. 


TABLE V. 
Correlation and Difference Correlations of 10°6/P and 100p;/h or 100¢/p;. 


; England with Wales Scotland Treland 
Variate 
10°¢/P as 
ee 100046 1009/p; 100p;/9 100¢/p; 100p,/¢ 100¢/p; 

Crude |—°760+ :046 |+°976+ ‘005 | — ‘861 + °028 |+ °944+ °012 |—°712+4 054 |+ 666+ ‘061 
Ay — 868 + 033 |+°848 + °038 | — °755+°058 | + °772 + 055 |— °819 + 045 | + °707 + (068 
A, — 879+ °035 |+ °875 + ‘037 | — 824+ °050 |+ °834+ °047 |— 922+ :023 |+°755 + ‘067 
A3 — '895 + 034 |}4+ °874+°041 | —:809 + :059 | + °824+ :055 |— 954+ :015 |4+ °791 + :064 
Ay — ‘895 + ‘037 | + °860 + -048 | —°811+ ‘064 |+ °805 + :065 |— 964+ °013 | + °805 + 065 
As — °898 + 038 |+ °847 + ‘056 | —°786 + ‘076 |+ °788 + °075 |— 970+ ‘012 |+°831+4 ‘061 
Ag — ‘907 + ‘037 |+ °850 + °058 | —°788+ 079 |+ "794+ 077 |—°973+°011 | + °848 + :059 
Ay —'917+:035 | + °835 + ‘067 | — "792+ 082 |+ °791 + ‘082 — + °857 + (056* 


Now this table at any rate demonstrates a very high correlation between $/P 
and p;/, while the previous table for Dr Newsholme’s third approximate segre- 
gation ratio led in the case of England with Wales to the value —‘587, and in 
the case of Scotland and Ireland to negligibly small values! Dr Newsholme 
himself writes: “ Any of these indirect forms of segregation ratio has therefore 
to be verified wherever possible by the application to the same community and 
period of one or more other forms of the ratio, and checked where practicable 
by a special examination of sample constituent communities whose figures are 
included in the total. This has been done so far as the information obtainable 
allowed. It will be seen that the results obtained by applying different ratios to 
the experience of the same country and period are usually, though not invariably, 
in good agreement ” (p. 268). 


What is quite clear from the above results is that, while in the case of 
Dr Newsholme’s two chief measures of segregation, there is very sensible difference 
in the case of England with Wales, there is an absolute discordance in the cases of 
both Scotland and Ireland. Accordingly on the basis of his own axiom, that we 
must check our results by application of one or more other forms of the ratio, 


* This correlation continues to rise until it reaches 929 with the thirteenth difference, but with such 
high differences the “population ” is so reduced that the method ceases really to be reliable. 
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we are bound to reject these ratios as even approximate measures of segre- 
gation*, 

But it would not be satisfactory to leave the matter here and not provide some 
explanation of why this fourth segregation ratio, both before and after the annul- 
ment of the time-factor, leads to such high correlations. Luckily the matter is 
capable of a perfectly straightforward and obvious explanation, which would have 


been anticipated had Dr Newsholme had in mind the danger of “spurious cor- 
relation.” 


What he is correlating are essentially ¢/P and p;,/¢. The latter may be 
written (p;/P)/(¢/P). Now pj/P is practically constant during the period in 
question. Hence Dr Newsholme is correlating ¢/P with 1/(¢/P), or a variate 
with its reciprocal. In other words we may anticipate something very closely 
approaching perfect correlation. The deviation from such correlation arises from 
the fact that p;/P is not absolutely steady, although its variations are very probably 
nearly random. The assertion therefore that this fourth measure of segregation 
assists in demonstrating the close relation between the fall in the phthisis death- 
rate and institutional segregation is based on a fallacy which entirely overlooks 
“ spurious correlation.” ; 


It will be seen therefore that not one of Dr Newsholme’s methods of reaching 
an approximate measure of the segregation is satisfactory, and they lead to con- 
tradictory and inconclusive results. Whether there is any really substantial 
relation between the prevalence of phthisis and institutional segregation we 
do not yet know. All we can say is that Dr Newsholme has entirely failed to 
demonstrate it, if it actually exists. 


(6) Before concluding this paper it may be of interest to judge how far it 
justifies the application of the method of variate difference correlation to such 
problems as are here dealt with. 


In the first place, the correlations of successive differences should approach 
steady values. This is generally—as the reader can judge by examining Tables I, 
II, IV and V—but not invariably, the case. The test cannot, however, be com- 
pleted, as the method ought not to be pressed to such high differences that the 
order of the difference is a large percentage of the original “ population.” 


We doubt whether it is advisable to carry differences beyond the 8th in a 
population of 38. 20°/, to 25°/, reduction in the population is as much surely as 
it is safe to allow where the original population is so small in number. It is true 
that a population of 38 itself is capable of exciting the derision of trained 


* Under the circumstances it is, perhaps, unnecessary to draw attention to Dr Newsholme’s state- 
ment that ‘‘the specific result of pauper segregation must have been lower in Ireland than in England or 
Scotland” (p. 282). Free of the time-factor the correlations of phthisis deathrate and Dr Newsholme’s 
fourth segregation ratio are higher in Ireland than in England or Scotland. This criticism as well as 


Dr Newsholme’s original remark are of no importance, because the fourth segregation ratio correlation 
is entirely spurious. 
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statisticians, and ought never to be used where hard work can produce larger 
numbers. But in annual returns, as has been indicated by others, a period of 
30 to 50 years is often the maximum attainable, and we must take what we can 
get. In the present case the probable errors of the difference correlations—based 
on the Andersonian formulae for steady conditions—show us that we can form 
fairly legitimate conclusions from the results reached. 


A second test that we have applied is the approach to the theoretical values 


in the function 0”; ,/o°5 .. where 6,0 is the mth difference of the variate «. 


The following table shows that there is a reasonable approach to these 
theoretical values in the calculated standard deviations of the differences, and 
suffices to justify the application of the variate difference method within the 
‘limits of practical statistics. We have continued the differences beyond the 
values used in some of the correlation results to indicate the sort of irregularities 
which may be expected to occur when using high differences in small populations. 
Terminal irregularities then begin to affect the uniform rise of 05 ,/o°5 
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THE INFLUENCE OF ISOLATION ON THE 
DIPHTHERIA ATTACK- AND DEATH-RATES. 


By ETHEL M. ELDERTON, Galton Research Fellow 
AND KARL PEARSON, F.R.S. 


(1) Introductory. The problem of the advantages of isolation, not only in the 
case of diphtheria but of other diseases of an infectious character, is hkely, owing 
to modern views as to “carriers” and other sources of transmission, to be much 
discussed in the near future. It is therefore well to consider what may be learnt 
from the statistics available. The questions which naturally arise are of the 
following kind: 

(i) In districts with a maximum of isolation is there a minimum of incidence? 

Gi) In districts with a maximum of isolation is there a minimum deathrate 
from the disease isolated ? 

There cannot be the slightest doubt that, if these two questions were answered in 
the affirmative and we could show that the incidence was markedly less and the 
deathrate significantly smaller in districts where isolation was most stringently 
carried out, then these results would be advanced as a strong argument in favour 
of isolation. 

To the trained statistician, however, no conclusion based upon such results 
without much further analysis would have any validity. To illustrate this point, 
let us consider the hypothetical case that medical or popular opinion in a given 
town has been persistently in favour of increasing the isolation-rate, and further 
suppose that in this district improved economic conditions have increased the 
immunity, or bettered sanitation lowered the incidence, while at the same time 
new methods of treatment have lowered the deathrate of the disease; it will be 
clear that in considering the statistical results over a course of years we should 
find a high isolation-rate negatively correlated with both the incidence- and the 
death-rates. Thus if we considered this correlational as a causal nexus, we 
should be raising an apparently strong argument in favour of a maximum of 
isolation, which would be based on the statistical fallacy, that when two quantities 
are both changing continuously with the time, this must of itself denote a causal 
relation. 
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In precisely the same way a positive correlation between the isolation rate and 
the attack- or death-rates by no means justifies us in asserting that isolation is 
worse than non-effective. It is conceivable that in the period or the district 
under consideration with an increasing isolation-rate there might be decreased 
immunity in the population, greater virulence of the disease, or even a limit 
to the available isolation accommodation, so that in the case of attacks of an 
epidemic nature the isolation rate would not increase proportionately to the cases, 
or indeed might even diminish*. Further, if apart from the changes in a single 
district, we consider a great variety of districts, it may chance that the greatest 
isolation-rate occurs in those districts where the disease has been found most 
prevalent, because it appeared the most obvious remedy, and thus a greater attack- 
or death-rate would be no real measure of the futility of high isolation. 


If, however, it should turn out that on the whole the higher isolation rate is 
associated with the bigher attack-rate or the higher death-rate then it will be clear 
(i) that there is ground for demanding a closer investigation as to the advantages 
of isolation, and (11) that we may be overlooking the real method, or at least one 
or more important factors, of the transmission of the disease. It is conceivable 
that isolation of all cases during attack may be of far less importance than isola- 
tion of certain special cases for a shorter or longer period well subsequent to the 
attack, and after they would normally have resumed their ordinary avocations fF. 


The main problems which arise are accordingly these : 


(i) Have isolation-, attack- and death-rates changed continuously with the 
time, and are the apparent correlations really suggestive of causal relationships ? 


(ii) Are associations between isolation-rate and attack- and death-rates really 
spurious arising from the fact that where the attack- and death-rates have been 
severe there the remedy which appeared nearest to hand was more isolation ? 


* For example, if there were only 100 hospital beds available, and out of 100 cases 50 were sent to 
hospital, the isolation-rate would be 50 °/,; but if in the next year there were 300 cases and all the beds 
were used, the isolation-rate would be only 33 °/,. Thus limited accommodation may tend to produce 
a negative correlation between isolation-rate and attack-rate, so that a positive correlation between these 
two rates may be of more importance than its apparent significance. It is extremely probable that 
some of the falls in isolation-rates are really due to an increase of incidence, so that the same 
percentage of cases cannot be met by the available hospital accommodation. 

+ It is, on the hypothesis of natural selection, a plausible view that the parasites—including under 
this term all disease organisms—which ultimately survive must tend to become innocuous to their 
hosts, and thus the decreasing virulence of certain diseases may be accounted for. The organism is 
destroyed owing to the death of the host or its own death at his recovery, or it has been modified by 
selection so as to become innocuous to its host relative to his immunity. But immunity is a matter of 
personal equation, and thus the function of the ‘‘ carrier” in preserving and spreading a conceivably less 
nocuous form of the organisrn becomes clearer. We are not unaware of the view that the organism 
remains the same, but that the immunity is increased owing to ‘‘ practise” of the leucocytes, but such a 
view requires the assumption of inheritance of acquired characters to explain reduced disease virulence, 
and further compels us to assume two types of immunity, the one which destroys the organism, and 
the other which without modifying it, establishes so to speak a mutual modus vivendi. 
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(ii) Are the districts which have adopted most isolation really urban 
districts where isolation was easiest to adopt and where possibly economic or 
social conditions favoured the spread of the disease or, in the case of the death- 
rate, the disease encountered a less resistant population ? 


(iv) What evidence is there to show that the districts which have rapidly 
increased their isolation-rates have subsequently lower attack- or death-rates ? 


If no one of these problems can be fully answered,—even in the case of a single 
disease—with the data at present available, at least light can be thrown on the lines 
which their solution in the future must take; and further something can be done 
to prevent hasty generalisation and excessive dogmatism as to the advantages or 
disadvantages of the isolation system. It can never be too strongly insisted upon, 
because it is so often forgotten, that preventive medicine is essentially an 
experimental science, and that in nine cases out of ten the efficiency of any line 
of action can only be adequately tested by statistics and by statistics collected after 
the expenditure of many thousand pounds, possibly spread over a long period of 
years, in carrying out this line of action *. 


(2) Material. In endeavouring to throw some light on the above problems we 
have fortunately received data of very considerable value from Dr E. H. Snell, the 
Medical Officer of Health for the City of Coventry. He obtained for a period of 
nine years, 1904-1912 inclusive, for about eighty towns or districts of large popula- 
tion but of very varying local conditions, (i) the annual number of diphtheria cases, 
(11) the number removed to hospital, (iii) the number of deaths. We have added 
to this material the estimated population of the town or district, and further 
certain data as to the economic and social conditions. Unfortunately there is no 
existing adequate measure of the general sanitary condition of individual towns, 
although the construction of a general sanitary “index number” would be of 
remarkable value in many forms of inquiry. We took as our measures of social 
condition : 


(a) Death-rate of infants under a year. 


(6) Amount of overcrowding, that is to say the percentage of the population 
in private families living more than two in a room. 


(c) Density of population, i.e. the number of persons to the acre. 


* Assert that it is most desirable to test the effect of sanatoria and of tuberculin in cases of 
tuberculosis, but do not dogmatically proclaim them as “cures” for phthisis, until statistics have been 
collected in sufficient amount and have been adequately and dispassionately examined to prove or 
disprove your statements. Insist on compulsory inoculation for enteric in the case of all recruits, but 
do not make it optional and then publish letters in the newspapers giving perfectly idle statistics, or 
go round to the camps giving popular lantern lectures to the recruits showing the gravestones of 
uninoculated persons, the portraits of persons dying of enteric, or much enlarged pictures of bacilli! 
If you think it experimentally worth doing, inoculate ; but don’t bring inoculation about by emphasising 
the dread of pain or the fear of death, both of which it is the first essential for a soldier wholly to 
disregard. 
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(d) Economic prosperity as measured by the number of indoor and outdoor 
servants of both sexes per 100 private families. 


Our data are baged on the census of 1911 as providing more ample information 
on these points. It will we think be admitted that the list of towns dealt with 
provides a very fair sample of the urban populations of this country. It ranges 
from manufacturing towns* like Preston, Rochdale and Bolton, mining and iron 
towns like Rhondda, Wigan and Middlesbrough, sea-ports like Hull, Liverpool 
and Southampton, to county towns like York and Reading, watering places like 
Brighton and Blackpool, suburban districts like Acton and Hornscy, and residential 
towns like Oxford or Bath. We ought from such a list to be able to throw some 
light on the relation of isolation to incidence under a variety of social conditions, 
if indeed these latter are factors in the problem at all. 


(3) What are the crude correlations between Isolation-Rate, Attack-Rate and 
Mortality-Rate? The isolation-rate (7) has been measured as the average per 
cent. of cases removed to hospital during a five or four year period. We have two 
such periods, the earlier period 1904-1908 and the later period 1909-1912. The 
attack-rate has been measured per 1000 of the population, uncorrected for age 
distribution. Since diphtheria is largely a disease of infancy and childhood this 
neglect of the age correction—the reduction to a standard population—may seem 
serious. But in the first place we had not the age incidence in the individual 
districts, and in the second place we satisfied ourselves that such correction, if it 
could have been made, would not substantially modify any argument we have 
based on our data. For we calculated the attack-rate (A’) on the population 
under 15 years of age, as well as the attack-rate (A) on the total population of the 
districts. We found the correlation between the two methods of measuring the 
attack-rate was +°972, which indicates how close is the relation between the two 
methods of measuring the attack-rate and how little influence small variations 
in the proportion of less immune persons in the population due to age differences 
could have on the resultst. 


The attack-rate (A) has been measured as the number of cases per 1000 of the 
population. The mortality-rate has been measured in two different ways; first as 
the population mortality, the death-rate in the ordinary sense (J/) or the deaths 
per 1000 of the population ; and secondly the case death-rate or the mortality (m) 
per 100 attacked. We now give the crude correlations between J and A. 
They are: 

First Period: 1904-1908, ry = +°427 + 063, 
Second Period: 1909-1912, Tra — +2290). 069: 
* See table, p. 567, for 76 of the 80 towns, the four others with full data only for the second period 
being Reading, Stoke, Dewsbury and Edmonton, 
+ The formula giving the juvenile attack-rate A’ in terms of the crude attack-rate A is: q 
A’=1°3094 A+ -0164 
with a probable error of -:1369. 


Thirty-three towns were selected at random out of the 80, and gave the following results for d’ 
calculated from 4 and A’ as observed. The theoretical mean error=*162; the mean error of the defects 
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Thus both periods show significant if not very large correlation*. The difference 
(1387 + 093) between the coefficients for the two periods is, however, probably 
not significant. Thus in towns with greater isolation-rate there is certainly a 
higher attack-rate, and equally certainly no argument can be based on the crude 
figures to prove that the more the isolation the less prevalent is diphtheria. We 
will now turn to the death-rate M, and we find: 

First Period: 1904-1908, Tim = +°153 + 075, 

Second Period: 1909-1912, Ny = ID Oho: 
In the first period isolation was associated with a higher diphtheria death- 
rate, in the second period with a lower diphtheria death-rate, but neither are 
of any real significance. Thus all we can conclude from the crude figures is 
that they show no evidence that isolation has reduced the general death-rate from 
diphtheria. 


We next take the case death-rate (m) and we have for the two periods: 


an 
a 


First Period: 1904-1908, Tim = — ‘509 + 057, 
Tam = — ‘d2T + 056, 
Second Period: 1909-1912, Tim = — 084 + 054, 
Pan = = 7405 4 -O5T. 


is—°153 and of the excesses+-134; this shows very fair accordance, 17 deviations being positive and 
16 negative. The greatest deviations occur in Hornsey, Bath and Brighton, where residential neighbour- 
hoods show fewer children, and in Edmonton, Walthamstow and Rhondda where there are probably 


Observed sree A | Observed rine ts | A | 
| | | 
Derby seal) Gee 4:25. | —:17 ||Edmonton ... | _ 1°36 1:80 | +4:44 
Southampton ... 2°67 2710) ate OSe Wl Bath. soenl|  aIsilfs} 87 = 83] 
Hornsey Acni|| eects! 1:86 | —°82 || Newport cesta wht 1°25 +11 
Bristol SS Alezioe 2°23 — 10 || Rhondda eae | ee lealte) 1°34 4°24 
Reading Lele one 1:95 | —'18 || Bury e108 ‘Ole |). =-17 
Nottingham ... 2°09 1:99 = 10 Rotherham... 1:06 1:18 +°12 | 
Salford ee 2°04 2°16 +12 || Dewsbury ae 1-01 1:01 =-*00 
Ilford see 1:98 2°09 +1] Blackburn julie 299 91 ==-)Si | 
Brighton 1:94 1°68 —:26 || Manchester... 94 ‘97 JECaB}. | 
Stockton 1:90 Is, +°22 | Oxford Fei||Ue eeaehy) "79 | =-10 | 
Ipswich 1:87 1°89 +:02 |} Bolton a 86 85 — ‘01 
Grimsby ee) 1:85 1:90 | +:05 || Rochdale oa "82 "75 —O7 | 
Walthamstow... 1°85 2°10 +°25 || Northampton ... ‘78 “76 — 02 
Coventry 1°84 1:93 | +:09 || Barnsley a To | -88" | ees 
Plymouth 1°61 1°56 -'05 | Wigan ose 585 | <64 +°06 | 
Wakefield 1°40 1°39 —‘O1 || W. Bromwich...| °45 53 +:08 | 
Smethwick IS Se seed Se aCe || 
| | 


excess of children. On the whole the general order is very well maintained, and the general attack-rate 
closely fixes the juvenile attack-rate. In any further collection of material, it would of course be well 
to have the age-distribution of cases. 

* We endeavoured to see whether the correlation of isolation- and attack-rates would be modified if 
we took the attack-rate on children under 15 years. This made little difference, 7 being raised only 
from +'290+ 069 to +°315 + 068. 
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According to these correlations, when or where the isolation-rate is high, the 
case mortality is low. Further when or where the attack-rate is high the case 
mortality is low. Now we know that: 


I =100 x isolated cases ~ all cases, 
A =1000 x all cases + population, 
m=100 x deaths + all cases. 


Hence if we selected the number attacked at random and chose the deaths to 
be simply some number less than this, we should expect to find a considerable 
negative correlation between A and m; and as we actually do find such a corre- 
lation, we cannot be certain that the actually observed values of r,,, are not due 
to “spurious correlation.” If they were “organic” we should interpret them to 
mean that a widespread epidemic (A large) was a less virulent epidemic (m small). 
On the other hand the spurious correlations of J and m would be positive in value, 
while the actual correlations are negative. Thus it would seem that while a high 
isolation-rate is associated with high attack-rate, it must be “organically” asso- 
ciated with a lessened case mortality. In other words while isolation does not, on 
the crude figures, appear to lessen the frequency of disease, it does appear to 
lessen the mortality among the attacked. This result appeared to be of such 
very great importance, if thoroughly established, that we determined to inquire 
into it further. It seemed reasonable to believe that the bulk of persons attacked 
might have better care in a hospital than in their own homes and thus isolation 
indirectly lessen the ill effects of the disease. 


We accordingly endeavoured to approach the problem from a somewhat 
different standpoint: Given two districts with the same total number of persons 
attacked (a), will that district with more isolated (7), have fewer or more deaths (d)? 
The answer to this question depends on whether the partial correlation coefficient 
of total isolated cases with total deaths for constant number attacked is negative 
or positive. We found: 


First Period Second Period 

Correlations 1904-1908 1909-1912 
rq = Isolated Cases and Deaths ate + °860 + :020 + 867 +:019 
Tiq =Asolated Cases and Attacked ize +°937+:010 +968 + 005 
Tq= Attacked and Deaths at Se + 907+ :014 +°918 + °012 
rectal hos end Datel} segsy-ort aor 


Thus in the first period for a given number of attacked more isolation was 
associated with more deaths, and in the second period for a given number of 
attacked, with fewer deaths; but in both periods, having regard to the probable 
errors, we cannot assert any real significance, or be reasonably certain that where 
there is more isolation, there recovery is more likely to occur. 


We shall see later that the correlation between J and m for constant total 
number of attacks is not the same thing as the correlation of the total isolated and 
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total deaths for constant total number of attacks. And this divergence, often in a 
marked degree, of partial correlations for rates and for absolute numbers is not 
unfamiliar to those who have had to deal with disease statistics. In the present 
case it renders still more obscure any argument drawn in favour of isolation from 
apparently lesser case mortality. 


? 


(4) On the degree to which “spurious correlation” may be influencing the 
attack- and death-rates. It seemed desirable if possible to throw further hight on 
this point and accordingly we correlated attack- and death-rates with the total 
population. It will be remembered that: 

A = 1000 x cases + population, 

M = 1000 x deaths + population, 
and accordingly if A and M be correlated with the population P, we might 
anticipate that if cases and deaths had no relation to population, there would be 
a high negative correlation arising from A and M both varying inversely as P. 
We were comforted by finding practically insignificant positive correlations. Thus: 


First Period Second Period 

Correlations 1904-1908 1909-1912 
’p4=Population and Attack-rate +:°137 +°075 + 054+ ‘075 
7py = Population and Death-rate ris lsO70 + ‘116+ 074 
7», =Population and Isolation-rate 52+ 075 +102 + ‘075 


The last correlation cofficients show us that there is very little relation between 
the size of a population and the amount of isolation practised. Further these 
isolation correlations in which there is no obvious source of spurious correlation 
are as significant as those of population with attack- and death-rates where the 
possibility of “spurious correlation” is manifest. We conclude accordingly that 
risk is more uniformly distributed over population than we had anticipated, and 
that the correlations between the three rates J, A and M are really open to 


? 


“organic” interpretation. 

The next point which arises for discussion is whether the presence of the total 
number attacked (a) in the rates J and m can produce spurious correlation. If so 
we should anticipate that the absolute number a would be negatively correlated 


with both isolation and case mortality rates. We found: 


First Period Second Period 

Correlations 1904-1908 / 1~ 99-1912 
Yaz =Total attacked and Isolation-rate 4+ :264+°072 -+°226 +072 
vam= Total attacked and Case-Mortality = Ost Oe — 903 + :072 


The first set of these coefficients are not even negative and therefore cannot 
be due to “spurious correlation,” although such correlation may have reduced 
their organic values. They admit, however, of an easy interpretation, namely 
that: where the number attacked has been large the isolation has been more 
practised. The second set of coefficients might be due to spurious correlation, but 
they again admit of a simple interpretation as apart from “ 
namely that: when the attacks are numerous the deaths are relatively few, 


spurious correlation,” 
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because a wide-spread epidemic means a mild epidemic. All four coefficients are 
significant, and pair and pair they are quite consistent but in no case are they 
of any marked importance. They enable us, however, to correlate the isolation- 
rate and the case-mortality for a constant total attacked, ie. to find the partial 
correlation rm. We have the following results: 
First Period Second Period 
1904-1908 1909-1912 
( Correlation of Isolation-rate ) 
ima a, a and Case mortality for — 474+ ‘056 —°512+°057 
aa | constant number attacked j 
while we have already found : 
Correlation of number isolated 
aa with number of deaths for +066 + :077 — 220+ :072 


constant number attacked 
Correlation of Total Numbers Isolated and Total Registered Deaths. 
Total Numbers Isolated. 


| 
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7 
750-1000 


5) 
500 
1250—1500 


2750—3000 
3000—3250 


1500—1 
1750—Z£ 
2000—2. 


1000. 


Deaths Registered. 


0— 75 64 2 -= iL — | — | = | — 94 
75—150 | 16 2 36 
150—225 2 9 
225—300 2 4 
300—375 1 6 
3875—450 2 5 
450—525 = ile 
§25—600 — 1 
600—675 10) 
675—750 (0) 
750—825 1 
Totals 
; | sl | espcreldae 
Means 54:2 | 59°8 | 112°5 | 133°9 | 932°5 | 312°5 382°5 at 2125 isolated 103°4 


It will therefore be clear that removing the variation in number attacked has 
made only shght reductions in the values of the correlation coefficients between 
isolation-rate and case-mortality. The discrepancy between the absolute numbers’ 
and the rates’ correlations is not to be accounted for by “spurious correlation ” 
involved in the use of total numbers attacked in both rates. It must therefore be 
due to: (i) lack of linearity in certain of the regressions, (11) high values in the 
coefficients of variation in certain of the quantities under discussion, or to a com- 
bination of these causes. With the small size of the populations under discussion 
it is by no means easy to test the true linearity of the regressions, even if we do 
what appears legitimate in this case, namely pool our data for the earlier and 
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later periods. Our actual correlations have all been found without grouping by 
the direct product-moment formula, but we give on pp. 556 and 558 two grouped 
tables to illustrate the difficulties which arise in analysis. Our first table is for 
the total numbers isolated and the total deaths registered. It will be seen at once 
that the marginal distributions are intensely skew, crowding up into the corner of 
few deaths and few cases isolated, so that they appear to asymptote to the zero 
Further, Diagram I shows that the regression curve 
Total isolated. 
O 250 500 750° 1000) 71250 1500 1750-2000: 2950 2500 


values of the coordinates. 


Total deaths. 


Dracram I. 


of deaths on total number isolated is, if just sensibly, still not markedly skew. 
Turning to the actual numbers given by this table we have the following series 


of constants : 
Numbers Isolated (i) Registered Deaths (d) 


Mean 460 406 ae S0 % =475°33 d =103°42 

Standard Deviation ae, Be o; =571°25 og=118'61 

Coefficient of Variation ... ies OF = | 140) = 1°15 
(=8.D./Mean) 

Correlation Coefficient and Ratio Tiq= '8348* + 0163 nai= 8564 + 


* Agrees reasonably well with the non-grouped values for the two separate periods. 


+ Found by taking means of all 13 column-arrays. 
71—2 
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Clearly these results are of much interest; they show that the difference of 7 
for deaths on isolation over 7 is not as great numerically as, perhaps, the graph 
suggests, but they indicate the markedly high values for the coefficients of 
variation. Now it is quite straightforward algebra to prove that 

aVita, dla = aVi, do 
provided we may neglect terms of the square and product order in v; and vg com- 
pared with unity, and this is perfectly legitimate when these coefficients of 
variation are, as is usual in anthropometric measurements, quite small quantities. 
But in the present case these quantities are greater than unity and their squares 
are not negligible as compared with unity, thus we need not be surprised at the 
marked inequality of q?ija,aa* and 47;,af found above. The values of the former 
show a marked relation between the case mortality and the isolation-rate, and the 
values of the latter indicate no appreciable betterment in the deaths due to 
increased isolation. Before we consider which of these coefficients gives us in the 
present case the better result as a guide to practical conduct, let us examine the 
correlation table for isolation-rate and case-mortality for the same 157 observations. 


Correlation of Isolation-Rate I and Case-Mortality m. 
Isolation- Rate. 


0—10) 10—20| 20-30 | 30—f0 40—50 | 50—60 | 60—70 70—80 | 80—90 | Totals 
| | . 
wl 38 ee 1 ii 6 eae 3 28 
ay cea 2 = 3 3 4 8 11 OV eee 45 
“| 12-16] 5 | 4 3 8 3 8 6 GS Saal 44 
Sel 7e=e20Nl) oa 2 1 2 3 3 i |) qo alee 
S | 20-24] 9 = 2 i 2 = 2 1 a 17 
= | 24—28 33 | 1 2 6 
Seieeeeeoa. ene ee a hae 1 
Ceres : 
| Totals | 23 4 11 is |e} 26 28 22 | 16 | 187 
Means | 19-04 | 14:00 | 16-18 | 15°60 | 14:00 | 11:08 | 11°71 | 11-09 | 9°25 13-26 | 


The following constants were found for this table: 


Isolation-Rate Case Mortality 
Mean Sen es OW ee age m =13:26 
Standard Deviation ae aes Gpo= 25:52 Om = 558 
Coefficient of Variation... ; Oe = OP} Um = 0°42 


Correlation Coefficient and Ratio TIm= — ‘5291 +:038 Nmrt= ‘5546 
The graph of the regression of case mortality on isolation-rate shows small 
evidence of skew regression (see Diagram II), and this is again confirmed by the 
difference between 1;,, and 7,,; being fairly small. The marginal frequency dis- 
tributions show, however, considerable skewness, and that for the isolation-rate is 
lumped up at the end where there is no isolation: more than half the numerator 
of 7,,; being contributed by the towns with little or no isolation. It is desirable 
to consider these towns further. They have an attack-rate of "76, which is sensibly 


* This is the a tm Of Our P. 556. ? + The values are given on our p. 554. 


Erne, M. Evperton AND Karu PEARSON 559 


less than the mean attack-rate (1°30), but they have a case-mortality of 19°04 as 
against the average case-mortality of 13:26; the 17 towns* with no isolation at 
all give a case-mortality of 19-4, It would thus appear that the towns with little 
or no isolation are those with a lower average attack-rate, but with rare exceptions 
their case-mortality is high. 


Isolation-rate. 


0 10 20 30 40 50 60 70 80 90 100 


Case-mortality. 


Dracram II. 


To test the influence of these towns with little or no isolation, we have 
removed the column 0—10 isolation-rate group and recalculated r;,, and 7,,;; we 
find 

Tim = — "4120 + 0484, 9,7 = °4810. 
Thus while the correlations are somewhat reduced by excluding the towns with 
little or no isolation there is still in the towns which do isolate a very sensible 
relation between the degree of isolation and the case-mortality, and this relation 
exhibits rather more skewness. : 


We may sum up as follows: The relation between greater isolation and a 
lessened case-mortality appears to be a real one. We have shown that it is hardly 
due to spurious correlation, as this would have produced a positive correlation and 
further no great changes are made when we correct for inequality in the numbers 


* South Shields (1st and 2nd Periods), Sunderland (1st and 2nd Periods), Barrow (1st Period), 
Preston (lst Period), Wigan (1st Period), Smethwick (1st and 2nd Periods), Walsall (1st and 2nd 
Periods), West Bromwich (1st and 2nd Periods), Coventry (1st and 2nd Periods), Barnsley (1st and 2nd 
Periods). Of these towns West Bromwich in the 1st period had the highest case-mortality recordeu of 
any of our 80 towns, while Smethwick in both periods, and Coventry and Barnsley in the 2nd period 
with no isolation had case-mortalities below the general average. 
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attacked. The regression is roughly linear and only very partially due to the high 
case-mortality in towns with no isolation. It is probable that where there is 
a large amount of isolation, the care of patients falls largely into the hands of 
a few men with a more extensive experience of the disease, and that this reduces 
the case-mortality. 


Against this may be set the fact that the correlations between the absolute 
numbers of deaths and of cases isolated for constant numbers attacked are in- 
significant. The divergence between the two methods of approaching the problem 
is, however, explicable because the coefficients of variation of the absolute numbers 
are greater than unity, and the identity of the correlations reached by the two 
methods depends on the neglect of the squares and products of the coefficients 
of variation compared to unity. It may be asked: Why in this case we prefer 
the partial correlation found from the rates to that found from the absolute 
numbers? We reply: Because the partial correlation coefficient for the absolute 
numbers depends on very high total correlations, and if these correlations be, as 
we have shown, non-linear, then the partial correlation coefficient not only loses 
its full meaning, but may, as experience has shown us, easily change its sign as 
well as its magnitude. We would suggest that in a minor sense total mortalities 
and total isolations are bound to give “restricted tables,” for deaths and isolated 
cases are perforce less than the numbers attacked, and that in such “restricted” 
tables, there is a general tendency to skew correlation and to a spurious factor*. 
On the other hand it is true that case-mortalities and isolation-rates cannot 
exceed 100°/, or fall short of 0°/,, but these limits are the same for every array 
and do not vary from array to array as in the previous case. On the whole we 
think it safe to say that isolation is associated with greater prevalence of the 
disease and with a lessened case-mortality. 


(5) Is there any significant Relation between Isolation-Rate and General 
Diphtheria Death-Rate? We have seen (p. 553) that insignificant correlations exist 
between J and M, and it is difficult to understand how a spurious factor could 
have modified this result. In the first place the small values of rp, and rpy on 
p. 555 show us that the value of p7,,, is sensibly the same as 7,,,; thus, for a con- 
stant population there is no sensible association between diphtheria mortality and 
isolation. But now let us ask whether for a constant attack-rate, isolation does 
not lessen general diphtheria mortality. We have: 


Correlation First Period Second Period 
7,4 =AIsolation-rate and Death-rate ... +1532 — ‘0119 
7,, =Isolation-rate and Attack-rate ... + 4268 +°2905 
?y4= Death-rate and Attack-rate a +°6772 +6879 


Hence 
iim =Isolation-rate and Death-rate a 


constant Attack-rate ee: Seuss 


* See especially the illustrations of such ‘‘restricted” tables and their regression lines in a paper 
by Waite on Finger-Prints: Biometrika, Vol. x, pp. 421—478. 
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Both of these values may be considered significant and negative, and hence 
when the attack-rate is constant there is a sensible, if not very close relationship 
between increased isolation and reduced general mortality from diphtheria. 


This confirms the view already reached that while isolation is associated with 
higher attack-rate its effect is to lessen the number of deaths whether they be 
reckoned as case-mortality or general population death-rate. 


(6) What is the meaning of the Association between Isolation and increasing 
prevalency of Diphtheria? The analysis of this problem is more complicated. 
The obvious answer of those who advocate increasing isolation would be that 
it has been adopted in those districts where the disease is most prevalent, and 
this of course may turn out to be correct. But we may ask in turn upon what 
statistics they depend to demonstrate their view that isolation lessens the preval- 
ence of the disease and is therefore advantageous, if our data demonstrate that 
where there is more isolation, there there is more diphtheria? It can only be by 
an analysis of no simple character that it is possible to deduce from such data 
that the practice of isolation has lessened the amount of the disease. 

There is, however, a preliminary problem to be dealt with. The isolation-rate 
has been increasing very sensibly from 1904 to 1912, the attack-rate has lessened 
although very slightly, the case-mortality has lessened and the mortality on the 
population is considerably less. These facts are exhibited in the following table : 


Means | Standard Deyiutions 


Variate | Symbol | 


| 1904-1908 


1909-1912 | 1904-1908 | 1909-1912 


Te! = oneal 


Attack-rate per 1000 population | A | 


be TERS IO pe MEIGS Fe) ‘657 | ‘639 
Isolation-rate per 100 attacked | JI | 42:4 | 55°7 25°52 | 25-18 
Mortality per 1000 population M 174 138 ‘080, ‘061 
Mortality per 100 attacked m | 14:6 Ti Pa peeo air 501 


Now it may well be, since the attack-rate has changed so little, that in the 
towns with increasing attack-rate there has been increasing isolation, both 
quantities changing with the time, but having no causal relation the one to the 
other. It is of some interest therefore to consider the type of districts in which 
isolation is most practised. In the first place we ask if any known bad social 
conditions are associated with prevalence of diphtheria. We took as our measure 
of sanitary conditions (1) the infant death-rate, or the deaths of children under one 
year per 1000 births, (11) overcrowding, or the percentage of the population in 
private families with more than two in a room. We found the following results : 


First Period Second Period 
Variates Correlated 1904-1908 1909-1912 
Attack-rate and Infant Death-rate — "206+ -074 — °206 + ‘072 


Attack-rate and Overcrowding ... — 1538+ ‘075 — 186+ ‘074 
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These are not very considerable, but they are consistent, and indicate, as 
far as they go, that the incidence of diphtheria is not dependent upon such 
measures as the above of unfavourable sanitary conditions. 


If we now turn to the correlation between the mortality-rate on the population 
and these measures of unfavourable sanitary conditions we find: 


First Period Second Period 


Variates Correlated 1904-1908 1909-1912 
Death-rate from Diphtheria and Infant Death-rate +081 + 076 +°118+ ‘074 
Death-rate from Diphtheria and Overcrowding ... +061 + 079 +°004+ :075 


All these are indeed positive, but they are of no significance and if they were 
significant would be so small as to be of no importance. The first indeed might 
have been anticipated to show a higher value, for a certain number of deaths from 
diphtheria must be deaths of infants. We can only conclude that as far as these 
measures of unsanitary conditions are concerned they do not in any way determine 
the diphtheria death-rate. 


We now turn to the isolation-rate and find: 
First Period Second Period 


Variates Correlated 1904-1908 -1909-1912 
Isolation-rate and Infant Deathrate ... — ‘414+ 064 - 375 + °065 
Isolation-rate and Overcrowding Bh — 236+ 073 — 235+ :071 


These are significant although not very large and we conclude that most 
isolation is practised in those districts which have the lowest infant deathrate and 
the least overcrowding; the correlations are sensible if not very large. In other 
words the towns with better health conditions have adopted more extensively the 
practice of isolating diphtheria cases. 


It seemed further of interest to determine: (1) whether diphtheria and isolation 
were more or less associated with urban conditions, and we took for this purpose 
the number of persons per acre, and (ii) whether the well-to-do character of the 
district, as measured by the number of domestic servants, indoor and outdoor, 
male and female per 100 private families, has any influence on the incidence of 
mortality from, or the isolation of diphtheria. We found: 


First Period Second Period 


Variates Correlated 1904-1908 1999-1912 
Persons per Acre and Attack-rate ine +165 + °075 +043 + ‘075 
Persons per Acre and Death-rate a +:°169+4°075 +:°115+:074 
Persons per Acre and Isolation-rate ... + :073 + ‘076 + °053 + ‘075 


Not one of these correlations is of any importance, if indeed any of them can be 
considered significant. It is thus clear that the intensity of urban conditions has 
very little to do with the prevalence of diphtheria, for if anything the suburban 
conditions have the lesser death-rate; clearly isolation has no sensible relatfon 
to number of persons per acre. 
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Turning now to our measure of the prosperity of the district, we find that it 
has no influence on the attack-rate, that it sensibly, but not very intensely affects 
the mortality, the higher death-rate occurring in the poorer districts, and that 
isolation is associated quite significantly with the prosperity of the district, i.e. the 
more well-to-do the district the more isolation is practised *. 


First Period Second Period 


Variates Correlated 1904-1908 1909-1912 
Number of Domestic Servants and Attack-rate ... +095 + 076 + 024+ °075 
Number of Domestic Servants and Death-rate ... —°219+°073 — +308 + :068 
Number of Domestic Servants and Isolation-rate + °437 + ‘062 + °363 +065 


We conclude therefore that the more prosperous and generally healthier 
districts are associated with fuller isolation, and that the more prosperous, but 
not necessarily the more healthy districts, have the less diphtheria death-rate. 
On the other hand the incidence of the disease seems independent of the prosperity 
or density of population of the district and to be somewhat greater in those towns 
where the sanitary conditions as judged by infant death-rate and overcrowding are 
better. 


Thus as far as our measures go, we must conclude that diphtheria is not to be 
considered as a disease of markedly urban districts, of overcrowded or of insanitary 
districts. It would appear that the more prosperous and healthy districts have 
the greater isolation and that these are subject to somewhat the greater incidence. 


* Of course this may largely mean that the more prosperous towns introduce isolation to remove 
the supposed danger of infection when servants of the families of the well-to-do are attacked. 

+ In order to ascertain whether the variates persons per acre (p,) and overcrowding (O) were merely 
measures of the size of the town population (P) we correlated P with p, and with O and found: 


"pp, = +404 + -064 (1904-8), = + -402 + -063 (1909-12), 

Tpo= +°091 + :076 (1904-8), = + :074 = -075 (1909-12). 
Thus overcrowding has no relation to the size of the town, the larger towns do not show more over- 
crowding. There is, however, a considerable association of persons per acre with total population, 


the larger towns having more persons per acre without exhibiting any more significant overcrowding. 
Making the population constant we find: 


First Period 1904-1908 | Second Period 1909-1912 
Total Correlation Partial Correlation | Total Correlation Partial Correlation 
» |—— a = 
Tap, =+°165£-075 | pray, = +1224 -076 "4p, = +1043 © 075 May, = +°023- 075 
Tyo = —- "1534075 | pryg = —°1674%:075 |) typ = - 1186 4-074 go = — 1140+ 074 


Thus correcting for population only makes the relation between persons per acre and incidence still 
more insignificant, while the relation between incidence and less overcrowding becomes slightly greater, 
without rising to any real importance. 

+ This result must not offhand be extended to subdistricts of our towns, it is an inter-urban and not 
intra-urban statement. 
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It will be seen at once that this conclusion opens up new problems: (i) Is the 
greater isolation the outcome of greater incidence, the only remedy suggestable 
for greater incidence being a more complete isolation? (ii) Is the greater 
incidence in some manner a result of the greater isolation, and does it really tell 
against isolation as a remedy against the spread of diphtheria? The association 
of greater isolation with local prosperity would then be merely a measure of the 
economic capacity of the district for carrying out the accepted sanitary code. 
(iii) If (ii) is to be answered in the negative, then is there any factor in 
prosperity which makes for a greater diphtheritic incidence? The final answers 
to these problems can probably not be given on the basis of the present data. 
The correlations under discussion although significant are not of such a marked 
character as to provide more than provisional statements, or indeed more than 
suggestions for further inquiry and tabulation. 


(7) Does greater Isolation follow increasing Incidence, or greater Incidence 
follow increasing Isolation? The problem is a much more subtle one than appears 
at first sight. What we have established is that those towns with the higher 
isolation-rate have the higher attack-rate. It does not follow from this that the 
individual town which increases its isolation-rate will increase its attack-rate. 
To determine whether this is so we took as our variates: increase in isolation- 
rate ([) between the periods 1904-8 and 1909-12*, and the similar increase A of 
the attack-rate. We found: 

Ti = Zoot 012; 
a value probably significant, although not quite so large as that found for the 
inter-urban relation : 
Y47=+°427 + 063 (for 1904-8), 
= + ‘290 + 069 (for 1909-12). 


We can, we think, therefore conclude that the towns which increase their isolation- 
rate are those with increasing attack-rate, just as the towns with higher isolation- 
rate are those with higher attack-rate. 


But this does not answer the question as to which is “the cart” and which 
“the horse”! Does the increased attack-rate precede or follow the increased 
isolation-rate ? To answer this question we divided our material into three 
periods each of three years, let us say 7,, T, and 7;. Then the attack-rate 
increase between 7’, and 7, was correlated with the isolation-rate in 7, and the 
isolation-rate increase between 7, and 7, with the attack rate in 7;. In other 
words we asked whether towns with most rapid increase of attack-rate in the 


* That is the total number treated in hospital x 100 and divided by the total number of attacks was 
taken for the first period and for the second period, and their difference (second period—first period) 
was treated as increase in isolation-rate. In the same way the sum of the totals attacked for the years 
of the first period x 1000 and divided by the sum of the calculated intercensal populations for the same 
years was treated as the attack-rate, and the difference of second and first period values taken as the 
increase in the attack-rate. 
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early periods had most isolation in the later period, and whether the towns with 
most rapid increase of isolation-rate in the earlier periods had most incidence in 
the later period. We found: 

ES Hate 004 + ‘077, 

rid, = t+ 085 + “077. 


Thus there is no significant relation whatever between either increase of attack- 
rate or increase of isolation-rate in the first periods, and the isolation or the 
incidence in the following period. 

As criticism of this result it might, perhaps, be suggested that the correlation 
of A,_, and J, will be influenced by what has been the course of J in the periods 
T, and T, and the nature of A in 7,; we have accordingly, in order to test this, 
made the isolation-rate constant in the first two periods and the attack-rate 
constant in the third period and find 


Thy 4g" 44 I, ~ +147 ae ‘076. 


This is still of no real significance, although the sign appears to indicate that 
where the isolation-rate has been constant then increasing attack-rate in the 
earlier period is followed by very slightly more isolation in the third period, even 
if the attack-rate in that period be itself constant. 

Similarly we determined : 

UR OTE ply Rit ‘O77 + 077. 


This coefficient shows that towns which have increased their isolation-rate during 
a period of constant attack are not liable to sensibly heavier attack in the 


following period. 


It would thus seem that our first two problems are both to be answered in the 
negative. Towns which increase their isolation are not those which in the fol- 
lowing period have most incidence, nor are those which have increasing incidence 
markedly those of most isolation in the following period. Attack and isolation 
appear to have no causative relation, and the association we have found between 
more isolation and more incidence seems to be contemporaneous rather than 
successional. We are, it seems, compelled to search for something in the environ- 
ment, which favours incidence and at the same time isolation. The only common 
factor that we have been able to reach at present is the prosperity and general 
healthy condition of the town. Under these circumstances there appears to be 
economic possibility of greater isolation, but why should there be greater incidence ? 
Is it possible that in the more prosperous towns there is greater consumption of 
some easily contaminated commodities, which may act as carriers of the disease, 
or more concourse of those of susceptible ages at places of public amusement or 
instruction ? 


(8) Test of the “organic” nature of the correlation of Isolation- and Attack- 
rates by the method of Variate Differences. If the suggestion made at the end 
of the last section be correct we should anticipate that by the use of the method 

129 
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of variate differences we should free ourselves from the influence of the time 
factor, if attack-rate and isolation-rate increase simultaneously in the more 
prosperous towns, but without organic association. We have nine years’ returns, 
but the epidemic nature of diphtheria in many cases does not give one great 
confidence in applying the method of variate differences to individual years. 
We considered that it would not be wise to deal with smaller intervals than 
three-year periods, and should have preferred had the data been available to work 
with five-year intervals. As it is, we cannot with three-year intervals for each 
town go beyond the second differences. We have accordingly 228 isolation-rates 
and attack-rates obtained from 76 towns for each of three three-year periods, 
152 first differences, and 76 second differences. We may symbolise them as I’ and 
A’, 6,1’ and 6,4’ and 6,J’ and 6,4’. We found the following results : 


rpg = +°332 +040, 
15,1'3,A! — + 936 + ‘052, 
13,1'3,4/ = + 159 + 075. 


The first of these results compares reasonably with the previous results for the 
first and second periods on p. 552, i.e. 


1904-1908:  rzy = + 427 + 063, 
1909-1912: rz4 = + 290 + 069, 


with a mean value of +°358. And this is the more true because the values of 
- rrq were found by the product moment method without grouping, while 77-4, was 


obtained from grouping in a correlation table*. 


Now the above values bring out very markedly that when we endeavour to 
remove the influence of the time factor and to obtain a purely organic relationship 
between J and A, we more than halve the correlation between them by proceeding 
to the second difference only. If we might suppose that a hyperbola would give 
the asymptotic value of 75 775 4’ from the above three known correlations we should 
have 


7084 


Ne i 2 
T3T8,4' = S105 a5 542, 


which indicates, although no stress can be laid on actual numbers, that at about 
the fifth difference rs 75 4, would tend to become negative. All we think it 
possible to say would be that if the time factor be eliminated there is very little 
positive organic association between high isolation-rate and high attack-rate to be 
cleared up,—certainly not more than is indicated by the correlation on p. 565: 


I, 43" Ay yl, ear piles ‘076, 


* It may be noted that 7; 4/3 7 was also found from a correlation table, but "5,1 ,4' 88 having only 


76 cases by product moment without grouping. 
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which seems to suggest that other things being constant increasing incidence is to 
some slight extent followed—probably as the only suggested remedy—by the 
higher isolation rates*. 


(9) Can any other Factors be determined which measure the Relation between 
urban conditions and the Incidence of Diphtheria? It is worth while from this 
standpoint to place the towns with which we have dealt in the order of incidence, 
each town being credited with the mean of the three attack-rates for each of 
three-year periods. Now an examination of the four columns of this table shows 
that, with the exception of Oxford—which has a child incidence (‘89 as com- 
pared to ‘70) considerably above the population incidence owing to relatively 
few children—the towns with the least diphtheria are the Midland, and _parti- 
cularly the Northern manufacturing towns. These constitute practically the 
whole of the first column of 19 towns. The last column contains the big ports 
and certain suburban metropolitan districts, indeed all these for which we have 
data except Plymouth, Devonport and Tottenham fall into the second half of the 


Seventy-siz Towns in order of their Diphtheria Incidence Rates 1904-1912. 


OCOOBONIKDOP WHY 


West Bromwich (‘40) | 20 Rotherham (91) | 39 Birkenhead (1:20) | 58 Brighton 
Northampton (45) | 21 South Shields (92) | 40 Rhondda (1:24) | 59 Stockton 
Wigan (48) | 22 Preston (93) | 41 Smethwick (1:25) | 60 Grimsby 
Walsall (49) | 23 Wallasey (94) | 42 Barrow (1:25) | 61 Leyton 
Stockport (53) | 24 Bath (95) | 43 Newport (1:25) | 62 West Ham 
Oldham (59) | 25 Bootle (96) | 44 Wimbledon (1°30) | 63 Salford 
Bolton (59) | 26 York (99) | 45 Great Yarmouth (1°31) | 64 Nottingham 
Oxford (‘70) | 27 Blackpool (99) | 46 Southend-on-Sea (1°32) | 65 St Helens 
Barnsley (71) | 28 Tynemouth (1:00) | 47 Birmingham (1°32) | 66 Walthamstow 
Southport - (72) | 29° Tottenham (1:02) | 48 Gillingham (1:34) | 67 Ilford 
Rochdale (73) | 30 Halifax (1:03) | 49 Ipswich (1°36) | 68 Southampton 
Leicester (76) | 31 Sheffield (1:03) | 50 Liverpool (1:37) | 69 Cardiff 
Manchester (‘79) | 32 Plymouth (1:07) | 51 Hornsey (1°39) | 70 Enfield 

Bury (80) | 33. Coventry (1:11) | 52 Darlington (1°39) | 71 Hull 
Blackburn (83) | 34 Warrington (1:11) | 53 Acton (1:43) | 72 Bristol 
Wolverhampton (:86) | 35 Devonport (1:13) | 54 Newcastle (1:45) | 73 Croydon 
Burnley (86) | 36 Sunderland (1:14) | 55 Burtonon Trent (1°48) | 74 Portsmouth 
Huddersfield (89) | 37. Bournemouth (1°17) | 56 Bradford (1°56) | 75 Derby 
Wakefield (91) | 38° Middlesbrough (1:20) | 57 Willesden (1°56) 76 Lincoln 


* It is perhaps worth while putting on record the additional statistical constants obtained in 
deducing the above correlations, as they are probably fairly reliable values and should be compared 
with the two period constants on p. 561: 


A’ =Mean Attack-rate 1:26 ; Standard Deviation, Attack-rate 655 
I’ =Mean Isolation-rate 47-75 ; Standard Deviation, Isolation-rate 26°341 
6,4’ =Mean Increase in Attack-rate —'086; Standard Deviation of change in Attack-rate 648 


8,4’ =Mean Increase in Isolation-rate 9:03; Standard Deviation of Increase in Isolation-rate 1°05 


Thus while most towns have been sensibly increasing their amount of isolation by 17 °/, to 18 °/, of 
its mean value, the decrease in the attack-rate has only been 6°/, to 7°/, of the mean incidence, and 
the correlations show that this decrease has not occurred in the towns with marked increase of 
isolation. 
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table, and there can be little doubt that on the whole sea-port conditions and 
the big new neighbourhoods round London favour, while manufacturing con- 
ditions restrict, the incidence of diphtheria. We have not data, however, available 
upon which we could test water and milk supply, or extent of consumption of 
milk and fish in these towns. The results for Derby and Lincoln are remarkable, 
but they are high for all three periods, and this notwithstanding the rapid 
increase of isolation in those towns. 


At first sight it seemed to us that the towns in the first column were markedly 
those in which there had been a greatly restricted birthrate*, while those in the 
last column were towns of greater fertility. Taking the births per 100 married 
women from 15 to 45 (B) we found: 


TAB = + ‘013 ar O75. 


Thus there is no association between incidence and the well-to-do character of a 
town as estimated by a low birthrate. 


Again having regard to the character of the towns in our first column, it 
occurred to us to test the incidence in relation to the employment of males in 
manufacturing processes involving smoke. We took out of the 1911 census the 
percentage (8) of males over 10 years of age, who fell under a rough test of 
smoke-producing occupations, namely 1x. 1, x. 1-2, 5-8, x1v. 1, Xv. and XVIII. 
1-6 of the Registrar-General’s classification, and we found : 


TA a ae "180 ta 073. 


This is possibly significant and would undoubtedly be emphasised had we 
included as a factor the women engaged in textile industries. There seems 
therefore some slight reason to suppose that the conditions favourable to smoke 
production are unfavourable to the spread of diphtheria. 


If the data could be procured, it would be worth while considering water and 
milk supply and the extent of fish consumption in the towns we have dealt with. 
If these were found to be of little influence, the road would certainly be clearer 
for dealing with the chronic diphtheritic human carrier as the chief source of 
the spread of the diphtheria bacillus. 


(10) Conclusions. 


(a) No influence of greater isolation in reducing the attack-rate from diph- 
theria is discoverable. In fact there is a sensible, if not large, positive association 
between the isolation-rate and attack-rate. 


(b) The case mortality is somewhat less where there is more isolation. This 
may very probably be accounted for by more cases coming under specialised medical 
care. 


* We had partially in view here also the possibility that restricted birthrate meant employment of 
women and thus less breast-feeding and greater use of milk, so that cross-currents might be at work. 


ib 
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(c) The attack-rate appears to be greater in the more prosperous towns and 
in towns of somewhat better sanitary conditions. We have not found the pre- 
valence of diphtheria associated with overcrowding or with the conditions leading 
to high infant mortality. 


(d) While a low birthrate, taken either as a measure of prosperity, or as 
a measure of the employment of women and so of the prevalence of hand feeding, 
appears to have no significance for the attack-rate of diphtheria, smoke-producing 
manufactures are probably unfavourable to the prevalence of the disease, which 
appears to attach itself in the main to the large ports and metropolitan suburban 
districts. 


(e) The association between the attack- and isolation-rates observed is not 
very significant, and while it might, to a very small extent, be due to increased 
isolation following or accompanying increased attack, it 1s more probably an 
association due to the more prosperous towns practising more isolation, and also 
to there being some element in prosperity which assists the spread of the disease. 


Generally all the correlations are of a low order; they contain, however, 
nothing to support the theory that isolation markedly limits the incidence of 
diphtheria ; the disease itself does not appear where overcrowding is greatest nor 
where the population is most dense; on the other hand isolation is most practised 
in those towns where domestic servants are most common and which may be 
supposed to be most prosperous. The chief argument for isolation—which can be 
drawn from the present data—is a lessened case-mortality, but such mortality 
might be obtained in all probability by specialised medical service as apart from 
isolation. 


MISCELLANEA. 


I. On the Probable Error of a Coefficient of Mean Square 
Contingency. 


By KARL PEARSON, F.R.S. 


Ler the sampled population be considered as to two variates and be represented by the total 
M and the cell-frequency m,, for the pth row and gth column cell. Further let the vertical 
marginal frequencies be given by m,, and the horizontal marginal frequencies by m,,, so that 
Myq+ Moq+ ...+Mpgt...=M.q, 
My + Mp. t+... + Myqg+...=Mp:- 


Let the corresponding quantities for the sample be WV, 2pq, 2.q and np.. 


Then we know that the mean square contingency ¢? is given by 


sg Hn: EEN 
& (w uu Yn) 

Pe ny eae ioe 

NxN Wo 


p= 


summed for every cell. 
Now in the great bulk of statistical phenomena we do not know more of the sampled popula- 


Mm, Mp: 
—* and —2 equal to 


ou 


tion than is given by the sample, and thus to determine ¢?2 we must put 


n, Np: : ‘ : 
and . Doing this we obtain the usual 


the most probable values known to us*, namely, Wi V 


value for the mean square contingency 


2 
P= (3% - Mn) |n Ra ltp } eee ebalt seeaeees -eceeeeReeee eres (ii). 


Starting from (ii) Blakeman and Pearson have found + the probable error of the mean square 
contingency. The process is admittedly very laborious and although it has now been used fairly 
often, it must be confessed that its chief value is to obtain appreciation of the probable errors of 
contingency coefficients in general, rather than in any usefulness in recording significant differences 
between long series of individual coefficients. 


But it has not been sufficiently recognised that the probable error thus found is that of the 
approximate value of the mean square contingency (ii) and not that of its true value (i). It is 
indeed the probable error of the expression actually used, but it is not the probable error of the 
true value as given by (i). The latter is easy to find and deserves consideration. Let us write 


Nm Mp | M?= ppg 
ol: (Mg — 2 “ hey (3 
then 2 __ § { LE LE | fe a eee 
ie N Mpg N Hyg)” 


* Pearson, Philosophical Magazine, Vol. u. 1900, pp. 164 et seq. 
+ Biometrika, Vol. v. pp. 191 et seq. 
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where we shall use ¢/2 and ¢,” for the true and approximate values of the mean square con- 
tingency. Thus 


Now for a sample of constant size pp, is constant and therefore representing small deviations 


by differentials 
én 
5 =F 29 (72 mt) 
as Ppq 
Square, add for all samples and divide - the number of such samples and we have 


4 (5 ) 8 Rg Up'q' 
ees y a) pa!’p'q 
Org 2= a7 A 5 Olan +3, 2 | —— Xon,o ly 
72 2 p} Ry'q'’ Rngtp'q' }? 
oe ON ie pq N' Mg Hp'q! pa ™p'q'” ™pa"p'a 


where oy, is the standard deviation of n,q and r is the correlation of deviations in 7p and 


pq ™p'q’ 
Np'q 3 S iS a summation for every cell and 3 for every pair of cells. 


But it is well known*® that 


Myq Mp'q' 
r a loo a ee he 


"pa"'a’ Mou” 


where y is the factor 1—(N -1)/(M/—1), usually put unity, since M is as a rule large compared 
with WV, and which will be here put unity for the remainder of the work. 


4 n2,.m 4 n,m 8 NngNy'q MygMp'q 
Hence ofa2=— 8 ( nar a) A Ss ( pg 2) on S ( pa!p'a’ “nq 2) 
mee Mp ng N M? peng N Poghy'g AL? 


4 N? 5M 4 Ng M 2 : 
= s( men {8 ("4 ali Se ty A iv). 
diy Mp ng N Mpyq w) 


This is the standard deviation of the true value of the mean square contingency, and in most 
cases will be of no service, for we do not know the true values of mpg and ppq- 


Fig 7 rp/q’ 


If we put these equal to the values obtained from the actual sample under consideration we 
obtain the approximate value of the standard deviation of the true mean square contingency, 
which we may represent by the symbol («¢ 2) and compare with what Blakeman and Pearson 

a 


found, i.e. (%% 2). Thus our alternatives are 
a /t 
a? + 67449 (042.5 
and pa’ t 67449 (og, 2), 
The real thing is pi? + 67449 0g 2. 
Shall we obtain a better insight into the variation of this by taking the approximate values of 
both ¢? and Tg Ps OF by taking the probable error of $,2? The problem is a subtle one, and, 


perhaps, only to be solved by experiment, not by theory. Of course when we take numerous 
samples and calculate $,’, then oy 2 will measure their variability. But this is not what we seek. 


We use ¢,? as an approximation to ¢,?, and it is the variability of the true value that we want. 
Are we not right in choosing (og), as its best value? In short would not—on the average of a 


great number of samples—(og), give us a closer result to THe than (79,2), ! 


Returning now to equation (iv) and putting in the observed values for 7pq, fypq We have 


(o92)a= - $1 aes {sa Pa ene ere (v). 


* The values here given are the true values before we approximate by putting mpq/M=Mpq/N, ete. 
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Or, after some reductions 


2 1 : 
(72), aN (hat dae = du'} Liana seariancersenre cocenodan aco (vi), 
7 _ hiqp: a 
where ve=S 4 (0 N ) SES4 aba ge cbs a sides e ROS (vil) 
(2 .q%p.)* 
(» = ares) 
and Data ee Nh ee eas Se (viii). 
N.gNp 
Again we have ane 5 
Var 
(7$) a iN 7 yt = hath vancedenneccen ooseee eee eee (ix). 
Now what we usually need is the ae error of the contingency coefficient 
= p'(1+$%) 
But TC,=o¢ A — C2)? =04/(1+ $2)? 
Thus the probable error of the coefficient of mean square contingency 
_ 67449 [a5/pa? +1 — a’) 2 
67419 x 00,= SF | oo } ue (x). 


This expression is much simpler than that for the probable error of the actually used value 
as given by Blakeman and Pearson*. It is not, however, asserted that it possesses greater 
theoretical validity. Those authors illustrate their formula by calculating the probable error of 
the contingency coefficient in the case of the association of handwriting and general intelligence 
in 1801 schoolgirls. They find 

C= ‘2957 + 0192. 

In the course of their work they deduce 


a2 = 09580, 
(o¢,,), = 03268, 
Wa? = "14865. 


Using these values we have from (ix) 


1 14865 
See | = 3. 
(Tb)a Sian loosest 90420} 0369 


It is clear therefore that (74,),, does not differ very substantially from (og, ),. Calculating 


from (x) the probable error of C,, we find it=-0217, while the Blakeman-Pearson process gave 
0192. The two values only differ by ‘0025, which is unlikely to be of importance in the case of 
most inferences in practical statistics. 


Beyond the knowledge of ¢,? only ,? is required by the present process. 


N.glp:\” Fs 2 .qMp: 
Py) ee eer oe ee GrgueDs 
1 g he N ae N 
SS : 


N.qMp: N gp: 


This may be written 


In finding the mean square contingency ¢,7, however, the three expressions 


2 

(x _ Nqp: 
a N N gp: 
pq — “ar 


’ 
N.qMp: 


N 


N.qMp: 


and W 


* loc. cit. p. 194. 
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must have been written down for each cell and thus y,? can be readily calculated. We can also 
treat W,? as given by 
Ve=S Nibyg_\ _ 1-3¢,2 

; (2 .qM:)” < 
but the cubing of the often rather large cell frequencies is troublesome, just as it is rather more 
troublesome to calculate 

2 
b2=S (= we )- 1 


N.gNp: 
N.qNy:\” 
Rpg — > 
1 (mm N ) 
¢ 2 as 
than da WV S AS ; 


owing to the largeness of the squares in the former expression. 


II. Measurements of Medieval English Femora. 


In a forthcoming memoir on the English Long Bones there will be a good deal to be said 
about the conclusions reached by Dr Parsons in his recent paper on the Rothwell femora. 
Meanwhile he has started an attack on the Biometric School in a Journal whose columns are 
not open to adequate reply,—i.e. to a reply of not greater length than the published attack— 
from members of that school. In his communication he suggested that I was unacquainted with 
the condition of atfairs at Rothwell, and behind this charge tried to escape any answers to 
the essential questions I asked him, and thus those questions still remain unanswered. 


The communication I made ran as follows : 


My informant who I hope is trustworthy speaks of (i) “the great mass of bones beneath the 
church at Rothwell” and (ii) of “the great collection of human bones beneath the old parish 
church at Rothwell” ; further (iii) “there are probably some 5000 or 6000 individuals represented 
in the vault at Rothwell, either altogether or in part”; and again (iv) “The stack varies in 
height and breadth, but is nowhere as high or broad as that at Hythe, although it is much 
longer. I know that at Hythe there are the remains of rather over 4000 people,....... I think 
that this collection contains more than this, partly because the stack is so much longer, partly 
because the bones are so much more decomposed and have therefore settled more.” 


Manouvrier after much piecing and mending while only able to measure the lengths of about 
16 femora from the neolithic burial places of Montigny and Esbly, was yet able to determine the 
pilastric index of 127, and the platymeric index in 127 bones, that is to say in eight times as 
many bones as those for which he could obtain the maximum length. And had he dealt fully 
with the head and neck and the popliteal region, the multiplying factor would probably have 
been ten. Had piecing, mending and a maximum of care in handling been used, I can hardly 
believe that what Manouvrier achieved at Montigny was not possible for Dr Parsons at Rothwell. 


Dr Parsons writes: “If the remains of femurs, whether they are fit or unfit for measurement, 
are counted it will be found that females are quite as numerous as males though measurable 
male femurs from their stronger build are Jess liable to break in being extricated from the pile of 
bones, and so there are more of them available for measurement.” The italics are mine. 

Much depends on the method of ‘extrication,’ and if the capacity of a bone to stand a hole 
being drilled in it with a bradawl be part of the necessary fitness for measurement then the 
number might undoubtedly be limited. But trusting to what I know has been achieved by the 
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French, I feel convinced that if Dr Parsons could measure 277 femoral heads where the femoral 
length was measurable, he could easily have measured 2000 heads in all and thus have ascer- 
tained, definitely, whether his Rothwell series is unique in showing a significant depression in 
frequency between 45 and 47mm. Further he could on such material by dealing with numbers 
8 to 10 times those he has provided have given definite answers to many of the problems con- 
cerning platymery and the pilastric and popliteal indices, which other observers have been vainly 
trying to solve on far less adequate and in many cases far more fragmentary material. 


I would note that Dr Parsons gives no reply at all to my question of why he used Dwight’s 
measurements as a criterion of sex when they referred to bones with the cartilages attached, 
because without this reply his careful attention to ‘other points’ when the head fell between 
45 and 47 mm. seems one-sided, and of no value in sexing the collection as a whole. He 
further gives no reply whatever to my question of why it is the male end, not the dwarf end, 
of his female distribution which is lacking, if absence of females be due to breakage. 


I would also state (i) that I have not sexed the Rothwell bones and therefore cannot say how 
far I should or should not agree with Dr Parsons. Dr Lee using the best available mathematical 
process found 145 9s and 133 gs, while Dr Parsons has 1039s and 174¢s. How this shows any 
agreement I fail to perceive ; (ii) that I have made no assertion about the bones being of the 
13th and 14th centuries. I merely headed my letter with Dr Parsons’ heading ‘ Measurements 
of Medieval English Femora,” and asked why, if Dr Parsons holds these bones to be such, he 
considers them without cartilages comparable with the mixed results of a modern American 
dissecting room plus the cartilages. 
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